In [92]:
#Loading libraries required for generation of dataframes for indel paper analysis.
library(ggplot2)
library(tidyverse)
library(tidyr)
library(ggpubr)
library(dplyr)
library(ggridges)
library(ineq)
library(RColorBrewer)
library(stringr)
library(gglorenz)
library(readr)
library(scales)
Rows: 11 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Genome, Accession, BRC ID, Feature ID, Annotation, Feature Type, St...
dbl (5): Genome ID, Start, End, Length, AA Length
lgl (7): RefSeq Locus Tag, Alt Locus Tag, FIGfam ID, PATRIC genus-specific f...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A spec_tbl_df: 11 × 21
GenomeGenome IDAccessionBRC IDRefSeq Locus TagAlt Locus TagFeature IDAnnotationFeature TypeStart⋯LengthStrandFIGfam IDPATRIC genus-specific families (PLfams)PATRIC cross-genus families (PGfams)Protein IDAA LengthGene SymbolProductGO
<chr><dbl><chr><chr><lgl><lgl><chr><chr><chr><dbl>⋯<dbl><chr><lgl><lgl><lgl><lgl><dbl><chr><chr><lgl>
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.3 NANAPATRIC.39054.14496.MW298156.mat_peptide.1715.2440.fwdPATRICmat_peptide1715⋯ 726+NANANANA2421C(VP3)VP3 (1C)NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.4 NANAPATRIC.39054.14496.MW298156.mat_peptide.2441.3331.fwdPATRICmat_peptide2441⋯ 891+NANANANA2971D(VP1)VP1 (1D)NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.5 NANAPATRIC.39054.14496.MW298156.mat_peptide.3332.3781.fwdPATRICmat_peptide3332⋯ 450+NANANANA1502A 2A NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.6 NANAPATRIC.39054.14496.MW298156.mat_peptide.3782.4078.fwdPATRICmat_peptide3782⋯ 297+NANANANA 992B 2B NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.7 NANAPATRIC.39054.14496.MW298156.mat_peptide.4079.5065.fwdPATRICmat_peptide4079⋯ 987+NANANANA3292C 2C NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.8 NANAPATRIC.39054.14496.MW298156.mat_peptide.5066.5323.fwdPATRICmat_peptide5066⋯ 258+NANANANA 863A 3A NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.9 NANAPATRIC.39054.14496.MW298156.mat_peptide.5324.5389.fwdPATRICmat_peptide5324⋯ 66+NANANANA 223B(VPg)3B NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.10NANAPATRIC.39054.14496.MW298156.mat_peptide.5390.5938.fwdPATRICmat_peptide5390⋯ 549+NANANANA1833C 3C NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.11NANAPATRIC.39054.14496.MW298156.mat_peptide.5939.7324.fwdPATRICmat_peptide5939⋯1386+NANANANA4623D 3D NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.1 NANAPATRIC.39054.14496.MW298156.mat_peptide.746.952.fwd PATRICmat_peptide 746⋯ 207+NANANANA 691A(VP4)VP4 (1A)NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.2 NANAPATRIC.39054.14496.MW298156.mat_peptide.953.1714.fwd PATRICmat_peptide 953⋯ 762+NANANANA2541B(VP2)VP2 (1B)NA
In [135]:
#Reading in dataframes need for figure generation and analysis
allStickles_final_input <- read_csv("allStickles_final_input_handle.csv") 
alldel_final_input <- read_csv("alldel_final_input.csv") 
Capsid_P2_Scores_shared <- read_csv("Capsid_P2_Scores_shared.csv") 
Replication_P2_Scores_shared <- read_csv("Replication_P2_Scores_shared.csv") 
Scores_Insertional_Handle_Fullproteome <- read_csv("Scores_Insertional_Handle_Fullproteome.csv") 
Scores_Deletions_Fullproteome <- read_csv("Scores_Deletions_Fullproteome.csv") 
Scores_Deletions1AA_Fullproteome_1AA <- read_csv("Scores_Deletions1AA_Fullproteome_1AA.csv") 
Capsid_P2_Deletions_shared <- read_csv("Capsid_P2_Deletions_shared.csv") 
Replication_P2_Deletion_shared <- read_csv("Replication_P2_Deletion_shared.csv") 
Fullproteome_input_DMS_Enrich2_long <- read_csv("Fullproteome_input_DMS_Enrich2_long.csv") 
Fullproteome_P2_DMS_Enrich2_long <- read_csv("Fullproteome_P2_DMS_Enrich2_long.csv") 
Capsid_P2_DMS_Enrich2_shared <- read_csv("Capsid_P2_DMS_Enrich2_shared.csv") 
Replication_P2_DMS_Enrich2_shared <- read_csv("Replication_P2_DMS_Enrich2_shared.csv") 
allStickles_AA_final_input <- read_csv("allStickles_AA_final_input.csv") 
Fullproteome_1AA_Enrich2_long <- read_csv("Fullproteome_1AA_Enrich2_long.csv") 
Competition_Pool_Enrich2_Scores <- read_tsv("main_identifiers_scores_Competition_Pooled_Experiment.tsv", col_names = TRUE)
Scores_Competition_input <- read_tsv("Competition_Pool_Input.tsv", col_names = TRUE)
EV_Features <- read_csv("EV71_4643_Features.csv")
Rows: 2193 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): dataset, Seq
dbl (2): indel, n

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 6579 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): dataset, Seq
dbl (2): indel, n

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 862 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): insertion, dataset
dbl (4): RepA_score, RepB_score, RepC_score, indel

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1331 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): insertion, dataset
dbl (4): RepA_score, RepB_score, RepC_score, indel

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2193 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): indel, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 6579 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): dataset
dbl (2): indel, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2193 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): dataset
dbl (2): indel, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2490 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): indel, RepA_score, RepB_score, RepC_score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 3962 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): indel, RepA_score, RepB_score, RepC_score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 43860 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): AminoAcid
dbl (2): position, count

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 43860 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): AminoAcid
dbl (2): position, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 16378 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): hgvs, Aminoacid
dbl (7): RepA_DMS_SE, RepA_DMS_score, RepB_DMS_SE, RepB_DMS_score, RepC_DMS_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 25289 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): hgvs, Aminoacid
dbl (7): RepA_DMS_SE, RepA_DMS_score, RepB_DMS_SE, RepB_DMS_score, RepC_DMS_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 43860 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): dataset, Seq
dbl (2): indel, n

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 43860 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): AminoAcid
dbl (2): insertion, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1933 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): hgvs
dbl (3): SE, epsilon, score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1932 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): hgvs
dbl (1): count

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 11 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Genome, Accession, BRC ID, Feature ID, Annotation, Feature Type, St...
dbl (5): Genome ID, Start, End, Length, AA Length
lgl (7): RefSeq Locus Tag, Alt Locus Tag, FIGfam ID, PATRIC genus-specific f...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A spec_tbl_df: 11 × 21
GenomeGenome IDAccessionBRC IDRefSeq Locus TagAlt Locus TagFeature IDAnnotationFeature TypeStart⋯LengthStrandFIGfam IDPATRIC genus-specific families (PLfams)PATRIC cross-genus families (PGfams)Protein IDAA LengthGene SymbolProductGO
<chr><dbl><chr><chr><lgl><lgl><chr><chr><chr><dbl>⋯<dbl><chr><lgl><lgl><lgl><lgl><dbl><chr><chr><lgl>
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.3 NANAPATRIC.39054.14496.MW298156.mat_peptide.1715.2440.fwdPATRICmat_peptide1715⋯ 726+NANANANA2421C(VP3)VP3 (1C)NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.4 NANAPATRIC.39054.14496.MW298156.mat_peptide.2441.3331.fwdPATRICmat_peptide2441⋯ 891+NANANANA2971D(VP1)VP1 (1D)NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.5 NANAPATRIC.39054.14496.MW298156.mat_peptide.3332.3781.fwdPATRICmat_peptide3332⋯ 450+NANANANA1502A 2A NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.6 NANAPATRIC.39054.14496.MW298156.mat_peptide.3782.4078.fwdPATRICmat_peptide3782⋯ 297+NANANANA 992B 2B NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.7 NANAPATRIC.39054.14496.MW298156.mat_peptide.4079.5065.fwdPATRICmat_peptide4079⋯ 987+NANANANA3292C 2C NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.8 NANAPATRIC.39054.14496.MW298156.mat_peptide.5066.5323.fwdPATRICmat_peptide5066⋯ 258+NANANANA 863A 3A NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.9 NANAPATRIC.39054.14496.MW298156.mat_peptide.5324.5389.fwdPATRICmat_peptide5324⋯ 66+NANANANA 223B(VPg)3B NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.10NANAPATRIC.39054.14496.MW298156.mat_peptide.5390.5938.fwdPATRICmat_peptide5390⋯ 549+NANANANA1833C 3C NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.11NANAPATRIC.39054.14496.MW298156.mat_peptide.5939.7324.fwdPATRICmat_peptide5939⋯1386+NANANANA4623D 3D NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.1 NANAPATRIC.39054.14496.MW298156.mat_peptide.746.952.fwd PATRICmat_peptide 746⋯ 207+NANANANA 691A(VP4)VP4 (1A)NA
Enterovirus A71 Tainan/4643/9839054.14MW298156fig|39054.14496.mat_peptide.2 NANAPATRIC.39054.14496.MW298156.mat_peptide.953.1714.fwd PATRICmat_peptide 953⋯ 762+NANANANA2541B(VP2)VP2 (1B)NA
In [94]:
#Figure 1. Deep Insertion, Deletion, and Mutational Scanning of the EV-A71 proteome

#Bar plot of insertional handle input library 
G_input_handle<-ggplot(allStickles_final_input)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#8BC3E4")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(77,157,235,314,392,470,548,626,706,784,862,940,1018,1096,1174,1252,1330,1409,1488,1566,1645,1724,1802,1881,1958,2037,2115,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10) 
G_input_handle
#Save figure
ggsave(plot = G_input_handle,filename = "meanPlot_inserts_input_handle.pdf",width =5,height=2)

#Ridgeline plot of insertional handle input library
Sublib <- c(0,77,157,235,314,392,470,548,626,706,784,862,940,1018,1096,1174,1252,1330,1409,1488,1566,1645,1724,1802,1881,1958,2037,2115,2193)
Sublib_label  <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28")
allStickles_final_input$cat <- cut(allStickles_final_input$indel, Sublib, Sublib_label)
allStickles_final_input_filter<- allStickles_final_input %>%group_by(cat)%>% mutate(Median=median(n+1))
Ridgeplot_input_lib <- ggplot(allStickles_final_input_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30)) + scale_x_log10(lim = c(50, 1000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_lib
#Save figure
ggsave(plot = Ridgeplot_input_lib,filename = "Ridge_plot.pdf",device = pdf,width =4,height=4)

#Lorenz curve and Gini Coefficient of insertional handle input library

# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(allStickles_final_input$n)
# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(allStickles_final_input$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")
# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)
# Calculate Gini coefficient for each category
gini_results <- tapply(allStickles_final_input_filter$n, allStickles_final_input_filter$cat, Gini)
# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")

# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
  lorenz_data <- Lc(n)
  lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
  return(lorenz_df)
}

# Calculate Lorenz curves for each category
lorenz_results <- by(allStickles_final_input_filter$n, allStickles_final_input_filter$cat, calculate_lorenz)

# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)

# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)

# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))

#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)

#Create a graph to plot Lorenz curves 
Lorenz_Curve_sublib <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:28) ))) +
  geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.22(0.08-0.34)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()
#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final  <- Lorenz_Curve_sublib+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted")+theme(legend.position = "none")
Lorenz_Curve_Final

#Save figure
ggsave(plot = Lorenz_Curve_Final,filename = "Lorenz_Curve_Final.pdf",device = pdf,width =4,height=4)

#Statistics included in the Figure 2 description
# Count positions with count more than 1
positions_above_1_insertional_handle <- sum(allStickles_final_input$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_insertional_handle/2193, "\n")


#Bar plot of deletion 3 bp input library 
alldel_final_input_3bp  <- filter(alldel_final_input,Seq=="3bp")
G_input_del<-ggplot(alldel_final_input_3bp)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#EB8F93")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_del
ggsave(plot = G_input_del,filename = "meanPlot_input_3bp.pdf",width = 5,height=2)

#Ridgeline plot of deletion 3 bp input library 
Sublib <- c(0,47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193)
Sublib_label  <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45")
alldel_final_input_3bp$cat <- cut(alldel_final_input_3bp$indel, Sublib, Sublib_label)
alldel_final_input_filter<-alldel_final_input_3bp %>%group_by(cat)%>%mutate(Median=median(n+1))
Ridgeplot_input_lib_del <- ggplot(alldel_final_input_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30, 40, 50)) + scale_x_log10(lim = c(50, 3000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_lib_del
ggsave(plot = Ridgeplot_input_lib_del,filename = "Ridge_plot_del.pdf",device = pdf,width = 4,height=4)

#Lorenz curve and Gini Coefficient of deletion 3 bp input library 

# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(alldel_final_input_3bp$n)

# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(alldel_final_input_3bp$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")

# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)

# Calculate Gini coefficient for each category
gini_results <- tapply(alldel_final_input_3bp$n, alldel_final_input_3bp$cat, Gini)

# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")

# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
  lorenz_data <- Lc(n)
  lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
  return(lorenz_df)
}

# Calculate Lorenz curves for each category
lorenz_results <- by(alldel_final_input_3bp$n, alldel_final_input_3bp$cat, calculate_lorenz)

# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)

# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)

# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))

#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)

#Create a graph to plot Lorenz curves NA=grey=Masterdf)
Lorenz_Curve_sublib <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:45) ))) +
  geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.36(0.06-0.46)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()

#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final_del  <- Lorenz_Curve_sublib+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted") +theme(legend.position = "none")
Lorenz_Curve_Final_del

#Save Figure
ggsave(plot = Lorenz_Curve_Final_del,filename = "Lorenz_Curve_Final_del.pdf",device = pdf,width =4,height=4)

#Statistics included in the Figure 2 description
#Count positions with count more than 1
positions_above_1_deletion_3bp <- sum(alldel_final_input_3bp$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_deletion_3bp/2193, "\n")


#Bar plot of DMS input library 
DMS_plot_input <- ggplot(Fullproteome_input_DMS_Enrich2_long)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=7),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(position,count,fill="#3c7430"),binwidth = 5,fun = function(X) {log10(sum(X))} )+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(51,102,153,204,255,306,357,408,459,510,561,612,662,712,762,812,862,916,970,1024,1078,1132,1186,1239,1292,1345,1398,1451,1504,1557,1610,1663,1716,1769,1822,1875,1928,1981,2034,2087,2140,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)+scale_fill_manual(values = "#3c7430")
ggsave(plot = DMS_plot_input,filename = "DMS_plot_input.pdf",width = 5,height=2)
DMS_plot_input

#DMS boxplot
Boxplot_DMS_Aminoacidcounts <- ggplot(Fullproteome_input_DMS_Enrich2_long, aes(y = count, x = factor(AminoAcid, levels = unique(AminoAcid)))) +ylim(0,400)+geom_boxplot(outlier.shape = NA)+theme_classic()+labs(y="Median Variant Count",x="")
Boxplot_DMS_Aminoacidcounts

#Save figure
ggsave(plot = Boxplot_DMS_Aminoacidcounts,filename = "Boxplot_DMS_Aminoacidcounts.pdf",width = 5,height=2)

#Statistics included in the Figure 2 description
#Count positions with count more than 1
Fullproteome_input_DMS_Enrich2_long_NAreplaced0 <- Fullproteome_input_DMS_Enrich2_long %>% replace(is.na(.), 0)
positions_above_1_DMS <- sum(Fullproteome_input_DMS_Enrich2_long_NAreplaced0$count > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_DMS/41667, "\n")
Picking joint bandwidth of 0.0384

Warning message:
“Removed 8 rows containing non-finite values (`stat_density_ridges()`).”
No description has been provided for this image
Picking joint bandwidth of 0.0384

Warning message:
“Removed 8 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.217829 
A data.frame: 28 × 2
catgini
<chr><dbl>
11 0.17690972
22 0.20266454
33 0.12101082
44 0.13268539
55 0.16395614
66 0.11511735
77 0.13078925
88 0.19452011
99 0.15415858
10100.33690447
11110.15459547
12120.33111207
13130.13254769
14140.14797611
15150.09756186
16160.13106871
17170.12301158
18180.11436645
19190.12500021
20200.11941092
21210.11839446
22220.11920143
23230.12470880
24240.10588332
25250.08427244
26260.11721940
27270.12847342
28280.09546884
Minimum value: 0.08427244 
Maximum value: 0.3369045 
No description has been provided for this image
Number of positions with count more than 1: 1 
No description has been provided for this image
Picking joint bandwidth of 0.034

Warning message:
“Removed 85 rows containing non-finite values (`stat_density_ridges()`).”
No description has been provided for this image
Picking joint bandwidth of 0.034

Warning message:
“Removed 85 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.3600231 
A data.frame: 45 × 2
catgini
<chr><dbl>
11 0.13955728
22 0.08073368
33 0.09550911
44 0.06303381
55 0.08504353
66 0.08042455
77 0.07828678
88 0.38021542
99 0.38672246
10100.09471075
11110.07023564
12120.11295881
13130.06462484
14140.11095486
15150.06371616
16160.05777935
17170.06188949
18180.12894821
19190.14632343
20200.46131001
21210.10939927
22220.16916241
23230.10405539
24240.11942941
25250.14864221
26260.11736034
27270.08681335
28280.11075934
29290.09081287
30300.12239759
31310.09972610
32320.17898990
33330.10096861
34340.08777201
35350.12806367
36360.11599762
37370.11235481
38380.08494888
39390.15630112
40400.08997237
41410.07475519
42420.12888948
43430.08559853
44440.10863150
45450.18619782
Minimum value: 0.05777935 
Maximum value: 0.46131 
No description has been provided for this image
Number of positions with count more than 1: 0.998632 
Warning message:
“Removed 2193 rows containing non-finite values (`stat_summary_bin()`).”
Warning message:
“Removed 2193 rows containing non-finite values (`stat_summary_bin()`).”
No description has been provided for this image
Warning message:
“Removed 7412 rows containing non-finite values (`stat_boxplot()`).”
No description has been provided for this image
Warning message:
“Removed 7412 rows containing non-finite values (`stat_boxplot()`).”
Number of positions with count more than 1: 0.999952 
No description has been provided for this image
In [95]:
#Supplementary Figure 2. Deep Deletion Scanning of the EV-A71 proteome

#Bar plot of deletion 6 bp input library 
alldel_final_input_6bp  <- filter(alldel_final_input,Seq=="6bp")
G_input_del_6bp<-ggplot(alldel_final_input_6bp)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#EB8F93")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_del_6bp
#Save figure
ggsave(plot = G_input_del_6bp,filename = "meanPlot_input_6bp.pdf",width = 5,height=2)

#Ridgeline plot of deletion 6 bp input library 
Sublib <- c(0,47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193)
Sublib_label  <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45")
alldel_final_input_6bp$cat <- cut(alldel_final_input_6bp$indel, Sublib, Sublib_label)
alldel_final_input_6bp_filter<-alldel_final_input_6bp %>%group_by(cat)%>%mutate(Median=median(n+1))
Ridgeplot_input_6bp_lib_del <- ggplot(alldel_final_input_6bp_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30, 40, 50)) + scale_x_log10(lim = c(50, 3000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_6bp_lib_del
#Save figure
ggsave(plot = Ridgeplot_input_6bp_lib_del,filename = "Ridge_plot_del_6bp.pdf",device = pdf,width = 4,height=4)

#Lorenz curve and Gini Coefficient of deletion 6 bp input library 

# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(alldel_final_input_6bp_filter$n)

# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(alldel_final_input_6bp_filter$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")

# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)

# Calculate Gini coefficient for each category
gini_results <- tapply(alldel_final_input_6bp_filter$n, alldel_final_input_6bp_filter$cat, Gini)

# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")

# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
  lorenz_data <- Lc(n)
  lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
  return(lorenz_df)
}

# Calculate Lorenz curves for each category
lorenz_results <- by(alldel_final_input_6bp_filter$n, alldel_final_input_6bp_filter$cat, calculate_lorenz)

# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)

# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)

# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))

#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)

#Create a graph to plot Lorenz curves 
Lorenz_Curve_sublib_6bp <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:45) ))) +
  geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.34(0.06-0.47)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()

#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final_del_6bp  <- Lorenz_Curve_sublib_6bp+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted")+theme(legend.position = "none")
Lorenz_Curve_Final_del_6bp

#Save figure
ggsave(plot = Lorenz_Curve_Final_del_6bp,filename = "Lorenz_Curve_Final_del_6bp.pdf",device = pdf,width = 4,height=4)

#Statistics included in the Figure 2 description
#Count positions with count more than 1
positions_above_1_deletion_6bp <- sum(alldel_final_input_6bp$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_deletion_6bp/2193, "\n")

#Bar plot of deletion 9 bp input library 
alldel_final_input_9bp  <- filter(alldel_final_input,Seq=="9bp")
G_input_del_9bp<-ggplot(alldel_final_input_9bp)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 1,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n,fill=Seq),binwidth = 5,fun = function(X) {log10(sum(X))} )+scale_fill_manual(values = "#EB8F93")+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)
G_input_del_9bp

#Save figure
ggsave(plot = G_input_del_9bp,filename = "meanPlot_input_9bp.pdf",width = 5,height=2)

#Ridgeline plot of deletion 9 bp input library 
Sublib <- c(0,47,96,144,192,240,288,336,384,432,480,528,576,624,672,720,768,815,862,911,961,1011,1061,1111,1161,1211,1261,1310,1359,1408,1457,1506,1555,1604,1653,1702,1751,1800,1848,1898,1948,1997,2046,2095,2144,2193)
Sublib_label  <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45")
alldel_final_input_9bp$cat <- cut(alldel_final_input_9bp$indel, Sublib, Sublib_label)
alldel_final_input_9bp_filter<-alldel_final_input_9bp %>%group_by(cat)%>%mutate(Median=median(n+1))
Ridgeplot_input_9bp_lib_del <- ggplot(alldel_final_input_9bp_filter, aes(x = n+1, y = cat)) + geom_density_ridges2(aes(fill=Median))+scale_y_discrete(expand = c(0, 0), breaks=c(1, 10, 20, 30, 40, 50)) + scale_x_log10(lim = c(50, 3000))+ylab("Sublibraries")+ xlab("Count")
Ridgeplot_input_9bp_lib_del
ggsave(plot = Ridgeplot_input_9bp_lib_del,filename = "Ridge_plot_del_9bp.pdf",device = pdf,width = 4,height=4)

#Lorenz curve and Gini Coefficient of deletion 9 bp input library 

# Calculate Lorenz curve points for all of the dataset
lorenz_data_master <- ineq::Lc(alldel_final_input_9bp_filter$n)

# Calculate Gini coefficient
gini_coef_master <- ineq::Gini(alldel_final_input_9bp_filter$n)
cat("Gini Coefficient_master:", gini_coef_master, "\n")

# Create a dataframe for Lorenz curve plotting
lorenz_df_master <- data.frame(x = lorenz_data_master$p, y = lorenz_data_master$L)

# Calculate Gini coefficient for each category
gini_results <- tapply(alldel_final_input_9bp_filter$n, alldel_final_input_9bp_filter$cat, Gini)

# Convert results to a data frame
gini_df <- data.frame(cat = names(gini_results), gini = gini_results)
gini_df
min_value <- min(gini_df$gini)
max_value <- max(gini_df$gini)
cat("Minimum value:", min_value, "\n")
cat("Maximum value:", max_value, "\n")

# Create a function to calculate Lorenz curve
calculate_lorenz <- function(n) {
  lorenz_data <- Lc(n)
  lorenz_df <- data.frame(x = lorenz_data$p, y = lorenz_data$L)
  return(lorenz_df)
}

# Calculate Lorenz curves for each category
lorenz_results <- by(alldel_final_input_9bp_filter$n, alldel_final_input_9bp_filter$cat, calculate_lorenz)

# Convert the results to a list of data frames
lorenz_list <- lapply(lorenz_results, as.data.frame)

# Combine the list of data frames into a single data frame
lorenz_df <- do.call(rbind, lorenz_list)

# Add a column for category
lorenz_df$category <- rep(names(lorenz_results), sapply(lorenz_results, nrow))

#Bind master_df with sub-libraries
loren<-bind_rows(lorenz_df_master, lorenz_df)

#Create a graph to plot Lorenz curves NA=grey=Masterdf)
Lorenz_Curve_sublib_9bp <- ggplot(loren, aes(x = x, y = y, col=factor(category, levels=c(1:45) ))) +
  geom_line() +geom_abline(intercept = 0, slope = 1, linetype = "dashed") +labs(title = "Gini Coefficient=0.34(0.06-0.48)", x = "Cumulative Proportion of Variants",y = "Cumulative Proportion of Reads")+theme_classic()

#Create a new graph to plot Lorenz curve sub-library and the master df with a dotted line
Lorenz_Curve_Final_del_9bp  <- Lorenz_Curve_sublib_9bp+geom_line(data = loren[is.na(loren$category), ], color = "black", size = 1,linetype="dotted")+theme(legend.position = "none")
Lorenz_Curve_Final_del_9bp

#Save figure
ggsave(plot = Lorenz_Curve_Final_del_9bp,filename = "Lorenz_Curve_Final_del_9bp.pdf",device = pdf,width = 4,height=4)

#Statistics included in the Figure 2 description
#Count positions with count more than 1
positions_above_1_deletion_9bp <- sum(alldel_final_input_9bp$n > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_deletion_9bp/2193, "\n")

#Boxplot of different deletion sizes
Boxplot_Delsize_input <- ggplot(alldel_final_input, aes(y=n,x=Seq))+geom_boxplot(outlier.shape = NA)+ylim(0,500)+ylab("Median Variant Count")+xlab("")+theme_classic()
Boxplot_Delsize_input

#Save figure
ggsave(plot = Boxplot_Delsize_input,filename = "Boxplot_Delsize_input.pdf",device = pdf,width = 1,height=1)

#Scatter plots for different sizes of deletions

#Tidying dataframe for 3bp, creating a column 3bp with input counts
alldel_final_input_filtered_3bp <- filter(alldel_final_input,Seq=="3bp") 
alldel_final_input_filtered_3bp <-  pivot_wider(alldel_final_input_filtered_3bp,names_from = Seq, values_from =n)
alldel_final_input_filtered_3bp <- select(alldel_final_input_filtered_3bp,indel,"3bp")

#Tidying dataframe for 6bp, creating a column 6bp with input counts
alldel_final_input_filtered_6bp <- filter(alldel_final_input,Seq=="6bp") 
alldel_final_input_filtered_6bp <-  pivot_wider(alldel_final_input_filtered_6bp,names_from = Seq, values_from =n)
alldel_final_input_filtered_6bp <- select(alldel_final_input_filtered_6bp,indel,"6bp")

#Tidying dataframe for 9bp, creating a column 9bp with input counts
alldel_final_input_filtered_9bp <- filter(alldel_final_input,Seq=="9bp") 
alldel_final_input_filtered_9bp <-  pivot_wider(alldel_final_input_filtered_9bp,names_from = Seq, values_from =n)
alldel_final_input_filtered_9bp <- select(alldel_final_input_filtered_9bp,indel,"9bp")

#Merging all three dataframes
alldel_final_input_filtered_3_6_bp <- merge(alldel_final_input_filtered_3bp,alldel_final_input_filtered_6bp,by="indel",all.x=TRUE,all.y=TRUE)
alldel_final_input_filtered_3_9_bp <- merge(alldel_final_input_filtered_3bp,alldel_final_input_filtered_9bp,by="indel",all.x=TRUE,all.y=TRUE)
alldel_final_input_filtered_6_9_bp <- merge(alldel_final_input_filtered_6bp,alldel_final_input_filtered_9bp,by="indel",all.x=TRUE,all.y=TRUE)


#Plotting 1AA against 2AA deletions and then using a linear model to obtain correlation, R-squared
Scatter_input_1AA_2AA <- ggplot(alldel_final_input_filtered_3_6_bp, aes(x = `3bp`, y = `6bp`)) + geom_point(size = 0.05,color="#EB8F93") +scale_x_log10()+scale_y_log10()+
labs(x = "1 AA", y = "2 AA") +theme_classic()
Scatter_input_1AA_2AA
ggsave(plot = Scatter_input_1AA_2AA,filename = "Scatter_input_1AA_2AA.pdf",device = pdf,width = 2,height=2)
model_1AA_2AA <- lm(`3bp` ~ `6bp`, data = alldel_final_input_filtered_3_6_bp)
summary_model_1AA_2AA <- summary(model_1AA_2AA)
summary_model_1AA_2AA

#Plotting 1AA against 3AA deletions and then using a linear model to obtain correlation, R-squared
Scatter_input_1AA_3AA <- ggplot(alldel_final_input_filtered_3_9_bp, aes(x = `3bp`, y = `9bp`)) + geom_point(size = 0.05,color="#EB8F93") +scale_x_log10()+scale_y_log10()+
labs(x = "1 AA", y = "3 AA") +theme_classic()
Scatter_input_1AA_3AA
ggsave(plot = Scatter_input_1AA_3AA,filename = "Scatter_input_1AA_3AA.pdf",device = pdf,width = 2,height=2)
model_1AA_3AA <- lm(`3bp` ~ `9bp`, data = alldel_final_input_filtered_3_9_bp)
summary_model_1AA_3AA <- summary(model_1AA_3AA)
summary_model_1AA_3AA


#Plotting 1AA against 3AA deletions and then using a linear model to obtain correlation, R-squared
Scatter_input_2AA_3AA <- ggplot(alldel_final_input_filtered_6_9_bp, aes(x = `6bp`, y = `9bp`)) + geom_point(size = 0.05,color="#EB8F93") +scale_x_log10()+scale_y_log10()+
labs(x = "2 AA", y = "3 AA") +theme_classic()
Scatter_input_2AA_3AA
ggsave(plot = Scatter_input_2AA_3AA,filename = "Scatter_input_2AA_3AA.pdf",device = pdf,width = 2,height=2)
model_2AA_3AA <- lm(`6bp` ~ `9bp`, data = alldel_final_input_filtered_6_9_bp)
summary_model_2AA_3AA <- summary(model_2AA_3AA)
summary_model_2AA_3AA
Picking joint bandwidth of 0.0369

Warning message:
“Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
No description has been provided for this image
Picking joint bandwidth of 0.0369

Warning message:
“Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.3375039 
A data.frame: 45 × 2
catgini
<chr><dbl>
11 0.20580339
22 0.10908092
33 0.11069086
44 0.06527017
55 0.11075620
66 0.08796619
77 0.10409583
88 0.46991261
99 0.32344657
10100.10035349
11110.08456274
12120.07399836
13130.08440298
14140.11709532
15150.09539333
16160.07071512
17170.10396686
18180.10768011
19190.15565094
20200.43460996
21210.13208113
22220.15789828
23230.20570313
24240.11960660
25250.16025306
26260.12935435
27270.11695801
28280.11278076
29290.09707268
30300.09029672
31310.10500336
32320.20945212
33330.08543672
34340.08426839
35350.12025691
36360.09713162
37370.08402851
38380.12017671
39390.09802442
40400.10237690
41410.08797313
42420.09456121
43430.07284639
44440.08917211
45450.19898368
Minimum value: 0.06527017 
Maximum value: 0.4699126 
No description has been provided for this image
Number of positions with count more than 1: 0.994072 
No description has been provided for this image
Picking joint bandwidth of 0.0378

Warning message:
“Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
No description has been provided for this image
Picking joint bandwidth of 0.0378

Warning message:
“Removed 71 rows containing non-finite values (`stat_density_ridges()`).”
Gini Coefficient_master: 0.3376191 
A data.frame: 45 × 2
catgini
<chr><dbl>
11 0.15179579
22 0.11317787
33 0.12487545
44 0.07973055
55 0.10663456
66 0.08536002
77 0.08228591
88 0.47617816
99 0.36233575
10100.08769753
11110.07148973
12120.08432235
13130.08868911
14140.10014758
15150.06397132
16160.07600504
17170.06726681
18180.15008007
19190.20519693
20200.47406300
21210.12644038
22220.19155743
23230.15228502
24240.14261428
25250.17640591
26260.13864841
27270.09946769
28280.09599150
29290.09180535
30300.09926261
31310.12243629
32320.24514666
33330.08974250
34340.11370892
35350.13667322
36360.08679486
37370.08880805
38380.12168116
39390.12165087
40400.09642445
41410.11122425
42420.10559112
43430.10377614
44440.08117261
45450.18701958
Minimum value: 0.06397132 
Maximum value: 0.4761782 
No description has been provided for this image
Number of positions with count more than 1: 0.99544 
Warning message:
“Removed 333 rows containing non-finite values (`stat_boxplot()`).”
No description has been provided for this image
Warning message:
“Removed 333 rows containing non-finite values (`stat_boxplot()`).”
Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”
No description has been provided for this image
Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”
Call:
lm(formula = `3bp` ~ `6bp`, data = alldel_final_input_filtered_3_6_bp)

Residuals:
    Min      1Q  Median      3Q     Max 
-591.72  -47.53   -3.33   40.49 2410.24 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -7.109358   3.085098  -2.304   0.0213 *  
`6bp`        0.955911   0.007645 125.032   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 101.8 on 2191 degrees of freedom
Multiple R-squared:  0.8771,	Adjusted R-squared:  0.877 
F-statistic: 1.563e+04 on 1 and 2191 DF,  p-value: < 2.2e-16
Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”
No description has been provided for this image
Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”
Call:
lm(formula = `3bp` ~ `9bp`, data = alldel_final_input_filtered_3_9_bp)

Residuals:
    Min      1Q  Median      3Q     Max 
-451.67  -48.93   -1.15   47.98  658.24 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -15.61006    2.91255   -5.36 9.22e-08 ***
`9bp`         1.02817    0.00761  135.11  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 95.03 on 2191 degrees of freedom
Multiple R-squared:  0.8928,	Adjusted R-squared:  0.8928 
F-statistic: 1.825e+04 on 1 and 2191 DF,  p-value: < 2.2e-16
Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”
No description has been provided for this image
Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”
Call:
lm(formula = `6bp` ~ `9bp`, data = alldel_final_input_filtered_6_9_bp)

Residuals:
     Min       1Q   Median       3Q      Max 
-2105.26   -40.21    -1.25    39.43   846.98 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.434942   2.903217   3.594 0.000332 ***
`9bp`        1.005189   0.007586 132.512  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 94.73 on 2191 degrees of freedom
Multiple R-squared:  0.8891,	Adjusted R-squared:  0.889 
F-statistic: 1.756e+04 on 1 and 2191 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [96]:
#Supplementary Figure 3. Deep Insertion Scanning of the EV-A71 proteome

#Bar plot of 5AA input library 
G_input_AA<-ggplot(allStickles_AA_final_input)+
geom_rect(data=EV_Features,aes(xmin=(Start-746)/3,xmax=(End-746)/3,ymin=0,ymax=5),alpha=rep(times = 20,c(0.15,0,0.15,0,0.15,0,0.15,0,0.15,0.15,0)))+
stat_summary_bin(show.legend = F,geom = "bar",aes(indel,n),binwidth = 5,fun = function(X) {log10(sum(X))} )+facet_grid(factor(Seq,c("A","C","D","E","F","G","H","I","K","L","M","N","P","Q","R","S","T","V","W","Y"))~.)+ facet_wrap(~Seq, ncol = 5)+
theme_classic()+ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+geom_vline(xintercept=c(77,157,235,314,392,470,548,626,706,784,862,940,1018,1096,1174,1252,1330,1409,1488,1566,1645,1724,1802,1881,1958,2037,2115,2193),linetype="dashed",colour="black",size=0.5,alpha=0.10)

#Save figure
ggsave(plot = G_input_AA,filename = "G_input_AA.pdf",width = 15,height=10)

#Print the result
positions_above_1_AA <- sum(allStickles_AA_final_input$n > 1)
cat("Number of positions with count more than 1:", positions_above_1_AA/43860, "\n")

#Calculate median count for different variants
Boxplot_1AA_input <- ggplot(allStickles_AA_final_input, aes(y=n,x=Seq))+geom_boxplot(outlier.shape = NA)+ylim(0,65)+ylab("Median Variant Count")+xlab("")+theme_classic()
Boxplot_1AA_input

# Calculate median for each unique sequence
unique_Seq <- unique(allStickles_AA_final_input$Seq)
Medians <- numeric(length(unique_Seq))
for (i in 1:length(unique_Seq)) {
  Medians[i] <- median(allStickles_AA_final_input$n[allStickles_AA_final_input$Seq == unique_Seq[i]])
}

# Create a data frame with unique sequences and their medians
Median_1AA_Seq <- data.frame(seq = unique_Seq, median_count = Medians)
Median_1AA_Seq
ggsave(plot = Boxplot_1AA_input,filename = "Boxplot_1AA_input.pdf",width = 3,height=2)
Saturation_1AA <- allStickles_AA_final_input %>% group_by(Seq) %>% summarize(saturation = sum(n > 1)/2193)
Saturation_1AA
Number of positions with count more than 1: 0.9763794 
Warning message:
“Removed 3542 rows containing non-finite values (`stat_boxplot()`).”
A data.frame: 20 × 2
seqmedian_count
<chr><dbl>
A16
C19
D18
E18
F15
G17
H20
I17
K19
L17
M16
N22
P17
Q18
R18
S19
T19
V16
W15
Y19
Warning message:
“Removed 3542 rows containing non-finite values (`stat_boxplot()`).”
A tibble: 20 × 2
Seqsaturation
<chr><dbl>
A0.9799362
C0.9799362
D0.9767442
E0.9794802
F0.9653443
G0.9776562
H0.9803922
I0.9762882
K0.9781122
L0.9671683
M0.9676243
N0.9872321
P0.9694482
Q0.9794802
R0.9772002
S0.9831281
T0.9840401
V0.9726402
W0.9648883
Y0.9808482
No description has been provided for this image
In [97]:
#Supplementary Figure 4. Deep Mutational Scanning of the EV-A71 proteome

#Heatmap for DMS input
# Convert Position to a factor with breaks
breaks_inputheatmap <- seq(0, 2193, by = 250)

# Create a function to assign labels based on position
get_label_input <- function(position) {
  sub_labels <- c("A", "B", "C", "D", "E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V")
  sub_index <- findInterval(position, breaks_inputheatmap)
  return(sub_labels[sub_index])
}

#create a colum sub for the different rows of the heatmap
Fullproteome_input_DMS_Enrich2_long$sub <- sapply(Fullproteome_input_DMS_Enrich2_long$position, get_label_input)

#Use amino acid as factors
Fullproteome_input_DMS_Enrich2_long$AminoAcid=factor(Fullproteome_input_DMS_Enrich2_long$AminoAcid,levels = c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))

#Heatmap of DMS input library
DMS_Heatmap_input <- ggplot(Fullproteome_input_DMS_Enrich2_long) + geom_tile(aes(position,AminoAcid, fill = log10(count))) +
    scale_fill_gradient2(limits = c(0,4),low = muted("white"), high = muted("aquamarine1"),space = "Lab",na.value = "black") +
    theme_classic() +  facet_wrap(sub ~ ., scales = "free_x",ncol=1) + theme(aspect.ratio = 0.1,axis.text.y = element_text(size = 3))+ theme(strip.text.x = element_blank())
DMS_Heatmap_input

#Save figure
ggsave(plot = DMS_Heatmap_input,filename = "DMS_Heatmap_input.pdf",width = 8,height=11)
No description has been provided for this image
In [98]:
#Supplementary Figure 5. Reproducibility of the Deep Insertion, Deletion, and Mutational Scanning experiments 

#Comparing the replicates for insertional handle library: Passage 2. Scatter plots of replicates against each other.
Capsid_P2_Scores_shared
Replication_P2_Scores_shared

#Passage 2 Replication

#RepA vs RepB Passage 2 Replication
Scatter_P2_insertion_replication_RepA_RepB <- ggplot(Replication_P2_Scores_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_insertion_replication_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_insertion_replication_RepA_RepB,filename = "Scatter_P2_insertion_replication_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepB_replication <- lm(`RepA_score` ~ `RepB_score`, data = Replication_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepB_replication <- summary(model_P2_insertion_RepA_RepB_replication)
summary_model_P2_insertion_RepA_RepB_replication

#RepA vs RepC Passage 2 Replication
Scatter_P2_insertion_replication_RepA_RepC <- ggplot(Replication_P2_Scores_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_replication_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_replication_RepA_RepC,filename = "Scatter_P2_insertion_replication_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepC_replication <- lm(`RepA_score` ~ `RepC_score`, data = Replication_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepC_replication <- summary(model_P2_insertion_RepA_RepC_replication)
summary_model_P2_insertion_RepA_RepC_replication

#RepB vs RepC Passage 2 Replication
Scatter_P2_insertion_replication_RepB_RepC <- ggplot(Replication_P2_Scores_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_replication_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_replication_RepB_RepC,filename = "Scatter_P2_insertion_replication_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepB_RepC_replication <- lm(`RepB_score` ~ `RepC_score`, data = Replication_P2_Scores_shared)
summary_model_P2_insertion_RepB_RepC_replication <- summary(model_P2_insertion_RepB_RepC_replication)
summary_model_P2_insertion_RepB_RepC_replication

#Passage 2 Capsid

#RepA vs RepB Passage 2 Capsid
Scatter_P2_insertion_RepA_RepB <- ggplot(Capsid_P2_Scores_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_insertion_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_insertion_RepA_RepB,filename = "Scatter_P2_insertion_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepB <- lm(`RepA_score` ~ `RepB_score`, data = Capsid_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepB <- summary(model_P2_insertion_RepA_RepB)
summary_model_P2_insertion_RepA_RepB

#RepA vs RepC Passage 2 Capsid
Scatter_P2_insertion_RepA_RepC <- ggplot(Capsid_P2_Scores_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_RepA_RepC,filename = "Scatter_P2_insertion_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepA_RepC <- lm(`RepA_score` ~ `RepC_score`, data = Capsid_P2_Scores_shared)
summary_model_P2_insertion_RepA_RepC <- summary(model_P2_insertion_RepA_RepC)
summary_model_P2_insertion_RepA_RepC

#RepB vs RepC Passage 2 Capsid
Scatter_P2_insertion_RepB_RepC <- ggplot(Capsid_P2_Scores_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_insertion_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_insertion_RepB_RepC,filename = "Scatter_P2_insertion_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_insertion_RepB_RepC <- lm(`RepB_score` ~ `RepC_score`, data = Capsid_P2_Scores_shared)
summary_model_P2_insertion_RepB_RepC <- summary(model_P2_insertion_RepB_RepC)
summary_model_P2_insertion_RepB_RepC
A spec_tbl_df: 862 × 6
insertionRepA_scoreRepB_scoreRepC_scoreindeldataset
<chr><dbl><dbl><dbl><dbl><chr>
p.(1handle) -4.955753-4.440084-4.490881 1handle
p.(2handle) -5.169184-4.653516-4.704312 2handle
p.(3handle) -5.054773-4.539105-4.589901 3handle
p.(4handle) -4.826284-4.310616-4.361413 4handle
p.(5handle) -4.684051-4.168383-4.219179 5handle
p.(6handle) -5.050274-4.534606-4.585402 6handle
p.(7handle) -5.027468-4.511800-4.562596 7handle
p.(8handle) -4.737862-4.222193-4.272990 8handle
p.(9handle) -4.859575-4.343906-4.394703 9handle
p.(10handle)-4.734775-4.219107-4.26990310handle
p.(11handle)-4.670958-4.155290-4.20608611handle
p.(12handle)-4.592150-4.076482-4.12727812handle
p.(13handle)-4.577787-4.062119-4.11291513handle
p.(14handle)-4.670958-4.155290-4.20608614handle
p.(15handle)-4.522010-4.006342-4.05713815handle
p.(16handle)-4.548427-4.032759-4.08355516handle
p.(17handle)-4.323886-3.808218-3.85901417handle
p.(18handle)-4.205228-3.689560-3.74035618handle
p.(19handle)-4.246365-3.730697-3.78149319handle
p.(20handle)-4.364987-3.849319-3.90011620handle
p.(21handle)-4.231136-3.715468-3.76626521handle
p.(22handle)-4.194674-3.679006-3.72980222handle
p.(23handle)-4.475035-3.959367-4.01016323handle
p.(24handle)-4.070571-3.554903-3.60569924handle
p.(25handle)-4.281022-3.765354-3.81615025handle
p.(26handle)-4.404466-3.888798-3.93959426handle
p.(27handle)-4.076559-3.560891-3.61168827handle
p.(28handle)-3.928856-3.413187-3.46398428handle
p.(29handle)-4.145751-3.630083-3.68087929handle
p.(30handle)-3.976162-3.460494-3.51129030handle
⋮⋮⋮⋮⋮⋮
p.(833handle)-3.8571402-3.3414722-3.3922684833handle
p.(834handle)-3.9007835-3.3851154-3.4359117834handle
p.(835handle)-4.0943113-3.5786433-3.6294395835handle
p.(836handle)-4.4044662-3.8887982-3.9395944836handle
p.(837handle)-4.4830190-3.9673510-4.0181472837handle
p.(838handle)-4.1457512-3.6300832-3.6808794838handle
p.(839handle)-4.0943113-3.5786433-3.6294395839handle
p.(840handle)-4.6098181-4.0941500-4.1449462840handle
p.(841handle)-4.3331666-3.8174985-3.8682947841handle
p.(842handle)-4.1345466-3.6188786-3.6696748842handle
p.(843handle)-4.1513068-3.6356387-3.6864350843handle
p.(844handle)-4.2052282-3.6895602-3.7403564844handle
p.(845handle)-4.5521444-4.0364764-4.0872726845handle
p.(846handle)-4.4629380-3.9472700-3.9980662846handle
p.(847handle)-4.4215242-3.9058562-3.9566524847handle
p.(848handle)-4.1786309-3.6629628-3.7137591848handle
p.(849handle)-3.9288555-3.4131875-3.4639837849handle
p.(850handle)-3.7472258-3.2315577-3.2823539850handle
p.(851handle)-3.2111887-2.6955207-2.7463169851handle
p.(852handle)-3.7878818-3.2722137-3.3230099852handle
p.(853handle)-4.4341294-3.9184614-3.9692576853handle
p.(854handle)-4.3514737-3.8358057-3.8866019854handle
p.(855handle)-3.7718172-3.2561491-3.3069453855handle
p.(856handle)-3.7388924-3.2232243-3.2740206856handle
p.(857handle)-4.3827263-3.8670582-3.9178544857handle
p.(858handle)-4.0705713-3.5549033-3.6056995858handle
p.(859handle) 1.5953199 0.5512497 0.8540936859handle
p.(860handle) 5.9323323 4.6930604 4.8115587860handle
p.(861handle) 5.8091086 5.1633204 5.7076631861handle
p.(862handle)-0.5538406-3.3340094-3.3848057862handle
A spec_tbl_df: 1331 × 6
insertionRepA_scoreRepB_scoreRepC_scoreindeldataset
<chr><dbl><dbl><dbl><dbl><chr>
p.(863handle) 5.733648 5.706546 5.5451139863handle
p.(864handle) 5.570286 5.669916 5.6042691864handle
p.(865handle) 5.273083 5.358577 5.4957911865handle
p.(866handle) 5.165721 5.183828 5.2658354866handle
p.(867handle) 4.962356 5.085517 4.7423284867handle
p.(868handle) 4.653415 4.296234 4.7650212868handle
p.(869handle) 4.074360 3.999124 4.4920633869handle
p.(870handle) 3.562570 3.512570 3.5321248870handle
p.(871handle) 1.660696 1.238419 1.5909884871handle
p.(872handle)-5.715586-5.822991-5.8704747872handle
p.(873handle) 1.264993 1.043876 0.2967718873handle
p.(874handle)-4.919616-5.027021-5.0745048874handle
p.(875handle)-4.749205-4.856610-4.9040936875handle
p.(876handle)-4.749205-4.856610-4.9040936876handle
p.(877handle)-5.088795-5.196200-5.2436835877handle
p.(878handle)-4.882053-4.989458-5.0369417878handle
p.(879handle)-5.163737-5.271141-5.3186249879handle
p.(880handle)-5.032780-5.140185-5.1876689880handle
p.(881handle)-4.919616-5.027021-5.0745048881handle
p.(882handle)-5.163737-5.271141-5.3186249882handle
p.(883handle)-4.901011-5.008416-5.0558996883handle
p.(884handle)-4.781466-4.888870-4.9363544884handle
p.(885handle)-4.955819-5.063224-5.1107078885handle
p.(886handle)-4.669618-4.777022-4.8245061886handle
p.(887handle)-4.843024-4.950428-4.9979123887handle
p.(888handle)-4.669618-4.777022-4.8245061888handle
p.(889handle)-4.928791-5.036195-5.0836791889handle
p.(890handle)-4.738216-4.845620-4.8931044890handle
p.(891handle)-4.595965-4.703369-4.7508532891handle
p.(892handle)-4.399531-4.506936-4.5544198892handle
⋮⋮⋮⋮⋮⋮
p.(2164handle)-5.049107-5.156512-5.2039962164handle
p.(2165handle)-5.149191-5.256595-5.3040792165handle
p.(2166handle)-4.760075-4.867479-4.9149632166handle
p.(2167handle)-4.964669-5.072073-5.1195572167handle
p.(2168handle)-5.156490-5.263895-5.3113792168handle
p.(2169handle)-4.621123-4.728528-4.7760122169handle
p.(2170handle)-5.104239-5.211644-5.2591282170handle
p.(2171handle)-5.080983-5.188387-5.2358712171handle
p.(2172handle)-5.088795-5.196200-5.2436832172handle
p.(2173handle)-4.681383-4.788787-4.8362712173handle
p.(2174handle)-5.096547-5.203951-5.2514352174handle
p.(2175handle)-4.937882-5.045286-5.0927702175handle
p.(2176handle)-5.049107-5.156512-5.2039962176handle
p.(2177handle)-5.096547-5.203951-5.2514352177handle
p.(2178handle)-5.119449-5.226853-5.2743372178handle
p.(2179handle)-5.185166-5.292570-5.3400542179handle
p.(2180handle)-5.126968-5.234372-5.2818562180handle
p.(2181handle)-5.016183-5.123587-5.1710712181handle
p.(2182handle)-5.065172-5.172576-5.2200602182handle
p.(2183handle)-4.919616-5.027021-5.0745052183handle
p.(2184handle)-5.134430-5.241835-5.2893192184handle
p.(2185handle)-4.982137-5.089541-5.1370252185handle
p.(2186handle)-5.156490-5.263895-5.3113792186handle
p.(2187handle)-4.982137-5.089541-5.1370252187handle
p.(2188handle)-5.292292-5.399696-5.4471802188handle
p.(2189handle)-5.065172-5.172576-5.2200602189handle
p.(2190handle)-5.080983-5.188387-5.2358712190handle
p.(2191handle)-4.862729-4.970133-5.0176172191handle
p.(2192handle)-4.862729-4.970133-5.0176172192handle
p.(2193handle)-4.193679-4.301084-4.3485682193handle
Call:
lm(formula = RepA_score ~ RepB_score, data = Replication_P2_Scores_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6173 -0.0177 -0.0146 -0.0091  4.1874 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.032423   0.029248   1.109    0.268    
RepB_score  0.983905   0.005388 182.620   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2713 on 1329 degrees of freedom
Multiple R-squared:  0.9617,	Adjusted R-squared:  0.9616 
F-statistic: 3.335e+04 on 1 and 1329 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepA_score ~ RepC_score, data = Replication_P2_Scores_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1609 -0.0158 -0.0116 -0.0044  3.2879 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.04727    0.02589   1.826   0.0681 .  
RepC_score   0.97876    0.00473 206.917   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2405 on 1329 degrees of freedom
Multiple R-squared:  0.9699,	Adjusted R-squared:  0.9699 
F-statistic: 4.281e+04 on 1 and 1329 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepB_score ~ RepC_score, data = Replication_P2_Scores_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7130 -0.0092 -0.0040  0.0052  3.6628 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.098954   0.027692  -3.573 0.000365 ***
RepC_score   0.973222   0.005059 192.363  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2572 on 1329 degrees of freedom
Multiple R-squared:  0.9653,	Adjusted R-squared:  0.9653 
F-statistic: 3.7e+04 on 1 and 1329 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepA_score ~ RepB_score, data = Capsid_P2_Scores_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.0007 -0.0292 -0.0246 -0.0208  6.9329 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.55265    0.06189   -8.93   <2e-16 ***
RepB_score   0.98417    0.01583   62.18   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4581 on 860 degrees of freedom
Multiple R-squared:  0.818,	Adjusted R-squared:  0.8178 
F-statistic:  3866 on 1 and 860 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepA_score ~ RepC_score, data = Capsid_P2_Scores_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0037 -0.0270 -0.0004  0.0224  4.5160 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.82831    0.05718  -14.49   <2e-16 ***
RepC_score   0.90780    0.01449   62.66   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4553 on 860 degrees of freedom
Multiple R-squared:  0.8203,	Adjusted R-squared:  0.8201 
F-statistic:  3926 on 1 and 860 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepB_score ~ RepC_score, data = Capsid_P2_Scores_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.6998 -0.0273  0.0150  0.0506  2.3964 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.53696    0.04620  -11.62   <2e-16 ***
RepC_score   0.85479    0.01171   73.02   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3678 on 860 degrees of freedom
Multiple R-squared:  0.8611,	Adjusted R-squared:  0.861 
F-statistic:  5332 on 1 and 860 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [99]:
#Comparing the replicates for deletion handle library: Passage 2 Scatter plots of replicates against each other.

#Passage 2 Replication shared Deletion

#RepA vs RepB Passage 2 Replication
Scatter_P2_deletion__Replication_RepA_RepB <- ggplot(Replication_P2_Deletion_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_deletion__Replication_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_deletion__Replication_RepA_RepB,filename = "Scatter_P2_deletion__Replication_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_replication_RepA_RepB <- lm(`RepA_score` ~ `RepB_score`, data = Replication_P2_Deletion_shared)
summary_model_P2_deletion_replication_RepA_RepB <- summary(model_P2_deletion_replication_RepA_RepB)
summary_model_P2_deletion_replication_RepA_RepB

#RepA vs RepC Passage 2 Replication
Scatter_P2_deletion__Replication_RepA_RepC <- ggplot(Replication_P2_Deletion_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_deletion__Replication_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion__Replication_RepA_RepC,filename = "Scatter_P2_deletion__Replication_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_replication_RepA_RepC <- lm(`RepA_score` ~ `RepC_score`, data = Replication_P2_Deletion_shared)
summary_model_P2_deletion_replication_RepA_RepC <- summary(model_P2_deletion_replication_RepA_RepC)
summary_model_P2_deletion_replication_RepA_RepC

#RepB vs RepC Passage 2 Replication
Scatter_P2_deletion__Replication_RepB_RepC <- ggplot(Replication_P2_Deletion_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_deletion__Replication_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion__Replication_RepB_RepC,filename = "Scatter_P2_deletion__Replication_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_replication_RepB_RepC <- lm(`RepB_score` ~ `RepC_score`, data = Replication_P2_Deletion_shared)
summary_model_P2_deletion_replication_RepB_RepC <- summary(model_P2_deletion_replication_RepB_RepC)
summary_model_P2_deletion_replication_RepB_RepC

#Passage 2 Capsid shared Deletion

#RepA vs RepB Passage 2 Capsid
Scatter_P2_deletion_RepA_RepB <- ggplot(Capsid_P2_Deletions_shared, aes(x = `RepA_score`, y = `RepB_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_deletion_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_deletion_RepA_RepB,filename = "Scatter_P2_deletion_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_RepA_RepB <- lm(`RepA_score` ~ `RepB_score`, data = Capsid_P2_Deletions_shared)
summary_model_P2_deletion_RepA_RepB <- summary(model_P2_deletion_RepA_RepB)
summary_model_P2_deletion_RepA_RepB

#RepA vs RepC Passage 2 Capsid
Scatter_P2_deletion_RepA_RepC <- ggplot(Capsid_P2_Deletions_shared, aes(x = `RepA_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_deletion_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion_RepA_RepC,filename = "Scatter_P2_deletion_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_RepA_RepC <- lm(`RepA_score` ~ `RepC_score`, data = Capsid_P2_Deletions_shared)
summary_model_P2_deletion_RepA_RepC <- summary(model_P2_deletion_RepA_RepC)
summary_model_P2_deletion_RepA_RepC

#RepB vs RepC Passage 2 Capsid
Scatter_P2_deletion_RepB_RepC <- ggplot(Capsid_P2_Deletions_shared, aes(x = `RepB_score`, y = `RepC_score`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_deletion_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_deletion_RepB_RepC,filename = "Scatter_P2_deletion_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_deletion_RepB_RepC <- lm(`RepB_score` ~ `RepC_score`, data = Capsid_P2_Deletions_shared)
summary_model_P2_deletion_RepB_RepC <- summary(model_P2_deletion_RepB_RepC)
summary_model_P2_deletion_RepB_RepC
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepB_score, data = Replication_P2_Deletion_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9004 -0.0175 -0.0024  0.0173  3.4109 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.127498   0.017498  -7.286 3.82e-13 ***
RepB_score   0.935870   0.003587 260.942  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2241 on 3959 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9451,	Adjusted R-squared:  0.945 
F-statistic: 6.809e+04 on 1 and 3959 DF,  p-value: < 2.2e-16
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepC_score, data = Replication_P2_Deletion_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9536 -0.0180 -0.0052  0.0116  3.3866 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.041769   0.017104   2.442   0.0146 *  
RepC_score  0.945384   0.003417 276.683   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.212 on 3959 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9508,	Adjusted R-squared:  0.9508 
F-statistic: 7.655e+04 on 1 and 3959 DF,  p-value: < 2.2e-16
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepB_score ~ RepC_score, data = Replication_P2_Deletion_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6258 -0.0172 -0.0100 -0.0002  4.2126 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.021357   0.021839  -0.978    0.328    
RepC_score   0.968960   0.004363 222.102   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2707 on 3959 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9257,	Adjusted R-squared:  0.9257 
F-statistic: 4.933e+04 on 1 and 3959 DF,  p-value: < 2.2e-16
Warning message:
“Removed 17 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 17 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepB_score, data = Capsid_P2_Deletions_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8571 -0.0107  0.0376  0.0802  3.1709 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.739523   0.015342    48.2   <2e-16 ***
RepB_score  0.795080   0.006856   116.0   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3094 on 2471 degrees of freedom
  (17 observations deleted due to missingness)
Multiple R-squared:  0.8448,	Adjusted R-squared:  0.8447 
F-statistic: 1.345e+04 on 1 and 2471 DF,  p-value: < 2.2e-16
Warning message:
“Removed 17 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 17 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepA_score ~ RepC_score, data = Capsid_P2_Deletions_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5576 -0.0116  0.0522  0.1084  3.1848 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.694602   0.018359   37.83   <2e-16 ***
RepC_score  0.729229   0.007752   94.06   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3669 on 2471 degrees of freedom
  (17 observations deleted due to missingness)
Multiple R-squared:  0.7817,	Adjusted R-squared:  0.7816 
F-statistic:  8848 on 1 and 2471 DF,  p-value: < 2.2e-16
Warning message:
“Removed 17 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 17 rows containing missing values (`geom_point()`).”
Call:
lm(formula = RepB_score ~ RepC_score, data = Capsid_P2_Deletions_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4108 -0.0115  0.0172  0.0434  4.6884 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.143661   0.017825   -8.06 1.18e-15 ***
RepC_score   0.876981   0.007527  116.51  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3563 on 2471 degrees of freedom
  (17 observations deleted due to missingness)
Multiple R-squared:  0.846,	Adjusted R-squared:  0.8459 
F-statistic: 1.358e+04 on 1 and 2471 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [100]:
#Passage 2 Replication shared DMS

#RepA vs RepB Passage 2 Replication
Scatter_P2_DMS__Replication_RepA_RepB <- ggplot(Replication_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepB_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_DMS__Replication_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_DMS__Replication_RepA_RepB,filename = "Scatter_P2_DMS__Replication_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_replication_RepA_RepB <- lm(`RepA_DMS_score` ~ `RepB_DMS_score`, data = Replication_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_replication_RepA_RepB <- summary(model_P2_DMS_replication_RepA_RepB)
summary_model_P2_DMS_replication_RepA_RepB

#RepA vs RepC Passage 2 Replication
Scatter_P2_DMS__Replication_RepA_RepC <- ggplot(Replication_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Replication_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Replication_RepA_RepC,filename = "Scatter_P2_DMS__Replication_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_replication_RepA_RepC <- lm(`RepA_DMS_score` ~ `RepC_DMS_score`, data = Replication_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_replication_RepA_RepC <- summary(model_P2_DMS_replication_RepA_RepC)
summary_model_P2_DMS_replication_RepA_RepC

#RepB vs RepC Passage 2 Replication
Scatter_P2_DMS__Replication_RepB_RepC <- ggplot(Replication_P2_DMS_Enrich2_shared, aes(x = `RepB_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Replication_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Replication_RepB_RepC,filename = "Scatter_P2_DMS__Replication_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_replication_RepB_RepC <- lm(`RepB_DMS_score` ~ `RepC_DMS_score`, data = Replication_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_replication_RepB_RepC <- summary(model_P2_DMS_replication_RepB_RepC)
summary_model_P2_DMS_replication_RepB_RepC

#Passage 2 Capsid shared DMS

#RepA vs RepB Passage 2 Capsid
Scatter_P2_DMS__Capsid_RepA_RepB <- ggplot(Capsid_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepB_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate B") +theme_classic()
Scatter_P2_DMS__Capsid_RepA_RepB
#Save Figure
ggsave(plot = Scatter_P2_DMS__Capsid_RepA_RepB,filename = "Scatter_P2_DMS__Capsid_RepA_RepB.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_capsid_RepA_RepB <- lm(`RepA_DMS_score` ~ `RepB_DMS_score`, data = Capsid_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_capsid_RepA_RepB <- summary(model_P2_DMS_capsid_RepA_RepB)
summary_model_P2_DMS_capsid_RepA_RepB

#RepA vs RepC Passage 2 Capsid
Scatter_P2_DMS__Capsid_RepA_RepC <- ggplot(Capsid_P2_DMS_Enrich2_shared, aes(x = `RepA_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate A", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Capsid_RepA_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Capsid_RepA_RepC,filename = "Scatter_P2_DMS__Capsid_RepA_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_capsid_RepA_RepC <- lm(`RepA_DMS_score` ~ `RepC_DMS_score`, data = Capsid_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_capsid_RepA_RepC <- summary(model_P2_DMS_capsid_RepA_RepC)
summary_model_P2_DMS_capsid_RepA_RepC

#RepB vs RepC Passage 2 Capsid
Scatter_P2_DMS__Capsid_RepB_RepC <- ggplot(Capsid_P2_DMS_Enrich2_shared, aes(x = `RepB_DMS_score`, y = `RepC_DMS_score`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Replicate B", y = "Replicate C") +theme_classic()
Scatter_P2_DMS__Capsid_RepB_RepC
#Save Figure
ggsave(plot = Scatter_P2_DMS__Capsid_RepB_RepC,filename = "Scatter_P2_DMS__Capsid_RepB_RepC.pdf",device = pdf,width = 2,height=2)
#Creating a linear model
model_P2_DMS_capsid_RepB_RepC <- lm(`RepB_DMS_score` ~ `RepC_DMS_score`, data = Capsid_P2_DMS_Enrich2_shared)
summary_model_P2_DMS_capsid_RepB_RepC <- summary(model_P2_DMS_capsid_RepB_RepC)
summary_model_P2_DMS_capsid_RepB_RepC
Call:
lm(formula = RepA_DMS_score ~ RepB_DMS_score, data = Replication_P2_DMS_Enrich2_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0238 -0.3853  0.0827  0.4296  4.6959 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.171714   0.007222  -23.78   <2e-16 ***
RepB_DMS_score  0.904960   0.002336  387.33   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8165 on 25287 degrees of freedom
Multiple R-squared:  0.8558,	Adjusted R-squared:  0.8558 
F-statistic: 1.5e+05 on 1 and 25287 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepA_DMS_score ~ RepC_DMS_score, data = Replication_P2_DMS_Enrich2_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9821 -0.3803  0.0943  0.4233  4.1366 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.227119   0.006812  -33.34   <2e-16 ***
RepC_DMS_score  1.067687   0.002627  406.37   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7835 on 25287 degrees of freedom
Multiple R-squared:  0.8672,	Adjusted R-squared:  0.8672 
F-statistic: 1.651e+05 on 1 and 25287 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepB_DMS_score ~ RepC_DMS_score, data = Replication_P2_DMS_Enrich2_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.8821 -0.3909  0.0810  0.4291  4.6228 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.221552   0.007011   -31.6   <2e-16 ***
RepC_DMS_score  1.090281   0.002704   403.2   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8063 on 25287 degrees of freedom
Multiple R-squared:  0.8654,	Adjusted R-squared:  0.8654 
F-statistic: 1.626e+05 on 1 and 25287 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepA_DMS_score ~ RepB_DMS_score, data = Capsid_P2_DMS_Enrich2_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6017 -0.3444  0.0706  0.3789  4.1668 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.149415   0.009167   -16.3   <2e-16 ***
RepB_DMS_score  0.940441   0.002738   343.5   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7283 on 16376 degrees of freedom
Multiple R-squared:  0.8781,	Adjusted R-squared:  0.8781 
F-statistic: 1.18e+05 on 1 and 16376 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepA_DMS_score ~ RepC_DMS_score, data = Capsid_P2_DMS_Enrich2_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4147 -0.4412  0.0611  0.3973  5.5852 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.210338   0.010317  -20.39   <2e-16 ***
RepC_DMS_score  0.946228   0.003172  298.30   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8223 on 16376 degrees of freedom
Multiple R-squared:  0.8446,	Adjusted R-squared:  0.8446 
F-statistic: 8.898e+04 on 1 and 16376 DF,  p-value: < 2.2e-16
No description has been provided for this image
Call:
lm(formula = RepB_DMS_score ~ RepC_DMS_score, data = Capsid_P2_DMS_Enrich2_shared)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.3287 -0.4338  0.0209  0.3516  5.7200 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.201484   0.009692  -20.79   <2e-16 ***
RepC_DMS_score  0.952428   0.002980  319.62   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7725 on 16376 degrees of freedom
Multiple R-squared:  0.8618,	Adjusted R-squared:  0.8618 
F-statistic: 1.022e+05 on 1 and 16376 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [101]:
#Figure 2. Fitness effects of insertions, deletions, and amino acid changes in EV-A71

#Bar plot of Insertional Handle Enrich2 P2
Scores_Insertional_Handle_Fullproteome_Plot <- ggplot()+stat_summary_bin(data = Scores_Insertional_Handle_Fullproteome, show.legend = F, geom = "bar", aes(indel, score), binwidth =1,fill="#1558a9")+  
geom_rect(data = EV_Features, aes(xmin=(Start-746)/3,xmax=(End-746)/3, ymin = 0, ymax = 8), alpha = rep(times = 1, c(0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0.1, 0)))+
theme_classic()+coord_cartesian(xlim=c(0,2193),ylim=c(-6,8))+xlab("Residue Position")+ylab("Enrich2 Score")+geom_hline(col='darkgrey',  lwd=0.75, yintercept=0)
ggsave(plot = Scores_Insertional_Handle_Fullproteome_Plot,filename = "Scores_Insertional_Handle_Fullproteome.pdf",width = 4,height=1.5)
Scores_Insertional_Handle_Fullproteome_Plot

#Bar plot of Deletion library Enrich2 P2
Scores_Deletions1AA_Fullproteome_plot <- ggplot()+stat_summary_bin(data = Scores_Deletions1AA_Fullproteome_1AA, show.legend = F, geom = "bar", aes(indel, score), binwidth =1,fill="#a91515")+  
geom_rect(data = EV_Features, aes(xmin=(Start-746)/3,xmax=(End-746)/3, ymin = 0, ymax =8), alpha = rep(times = 1, c(0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0.1, 0)))+
theme_classic()+coord_cartesian(xlim=c(0,2193),ylim=c(-6,8))+xlab("Residue Position")+ylab("Enrich2 Score")+geom_hline(col='darkgrey',  lwd=0.75, yintercept=0)
ggsave(plot = Scores_Deletions1AA_Fullproteome_plot,filename = "Scores_Deletions1AA_Fullproteome_plot.pdf",width = 4,height=1.5)
Scores_Deletions1AA_Fullproteome_plot

#Bar plot of DMS library Enrich2 P2
Scores_DMS_Fullproteome_plot <- ggplot()+stat_summary_bin(data = Fullproteome_P2_DMS_Enrich2_long, show.legend = F, geom = "bar", aes(position, score), binwidth =1,fill="#3c7430")+  
geom_rect(data = EV_Features, aes(xmin=(Start-746)/3,xmax=(End-746)/3, ymin = 0, ymax =3), alpha = rep(times = 1, c(0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0, 0.1, 0.1, 0)))+
theme_classic()+coord_cartesian(xlim=c(0,2193),ylim=c(-5,3))+xlab("Residue Position")+ylab("Enrich2 Score")+geom_hline(col='darkgrey',  lwd=0.75, yintercept=0)
ggsave(plot = Scores_DMS_Fullproteome_plot,filename = "Scores_DMS_Fullproteome_plot.pdf",width = 4,height=1.5)
Scores_DMS_Fullproteome_plot
No summary function supplied, defaulting to `mean_se()`
No summary function supplied, defaulting to `mean_se()`
Warning message:
“Removed 42 rows containing non-finite values (`stat_summary_bin()`).”
No summary function supplied, defaulting to `mean_se()`
Warning message:
“Removed 42 rows containing non-finite values (`stat_summary_bin()`).”
No summary function supplied, defaulting to `mean_se()`
No description has been provided for this image
Warning message:
“Removed 2193 rows containing non-finite values (`stat_summary_bin()`).”
No summary function supplied, defaulting to `mean_se()`
Warning message:
“Removed 2193 rows containing non-finite values (`stat_summary_bin()`).”
No summary function supplied, defaulting to `mean_se()`
No description has been provided for this image
No description has been provided for this image
In [102]:
#Figure 2: DMFE of insertions, deletions, and DMS

#Selecting only the score column and plotting a histogram for insertions
histo_insertions_sel  <- select(Scores_Insertional_Handle_Fullproteome,score)
histo_insertions_sel_plot <- ggplot(histo_insertions_sel, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#1558a9"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-6,6))
histo_insertions_sel_plot
#Save Figure
ggsave(plot = histo_insertions_sel_plot,filename = "histo_insertions_sel_plot.pdf",width = 2,height=1.5)
#Filtering df for insertions with a score higher than 0
histo_insertions_sel_nonlethal <- filter(histo_insertions_sel,score>0)
#Plotting a histogram for insertions with a score higher than 0
histo_insertions_sel_nonlethal_plot <- ggplot(histo_insertions_sel_nonlethal, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#1558a9"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
histo_insertions_sel_nonlethal_plot
#Save Figure
ggsave(plot = histo_insertions_sel_nonlethal_plot,filename = "histo_insertions_sel_nonlethal_plot.pdf",width = 2,height=1.5)

#Selecting only the score column and plotting a histogram for deletions
histo_del_sel  <- select(Scores_Deletions_Fullproteome,score)
del_histogram <- ggplot(histo_del_sel, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-6,6))
del_histogram
#Save Figure
ggsave(plot = del_histogram,filename = "del_histogram.pdf",width = 2,height=1.5)
#Filtering df for deletions with a score higher than 0
histo_del_sel_nonlethal <- filter(histo_del_sel,score>0)
#Plotting a histogram for deletions with a score higher than 0
del_histogram_nonlethal <- ggplot(histo_del_sel_nonlethal, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
del_histogram_nonlethal
#Save Figure
ggsave(plot = del_histogram_nonlethal,filename = "del_histogram_nonlethal.pdf",width = 2,height=1.5)

#Selecting only the score column and plotting a histogram for DMS
histo_DMS_sel  <- select(Fullproteome_P2_DMS_Enrich2_long,score)
histo_DMS_sel_plot <- ggplot(histo_DMS_sel, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#3c7430"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-6,6))
histo_DMS_sel_plot
#Save Figure
ggsave(plot = histo_DMS_sel_plot,filename = "histo_DMS_sel_plot.pdf",width = 2,height=1.5)
#Filtering df for DMS with a score higher than 0
histo_DMS_sel_nonlethal <- filter(histo_DMS_sel,score>0)
#Plotting a histogram for DMS with a score higher than 0
histo_DMS_sel_plot_nonlethal_plot <- ggplot(histo_DMS_sel_nonlethal, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = ("#3c7430"), color = "black") +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
histo_DMS_sel_plot_nonlethal_plot
#Save Figure
ggsave(plot = histo_DMS_sel_plot_nonlethal_plot,filename = "histo_DMS_sel_plot_nonlethal.pdf",width = 2,height=1.5)
No description has been provided for this image
Warning message:
“Removed 145 rows containing non-finite values (`stat_bin()`).”
No description has been provided for this image
Warning message:
“Removed 145 rows containing non-finite values (`stat_bin()`).”
No description has been provided for this image
Warning message:
“Removed 2193 rows containing non-finite values (`stat_bin()`).”
No description has been provided for this image
Warning message:
“Removed 2193 rows containing non-finite values (`stat_bin()`).”
No description has been provided for this image
No description has been provided for this image
In [103]:
##enrich2: 
## enrich2<0: low
## 0<enrich2<2 : mid
## enrich2>2: high

#Figure 2: Classification of variants into different enrich2 class scores, and comparing distribution for different viral proteins

# Create a mapping of positions to viral proteins
Viral_protein_Mapping <- data.frame(
  start = c(1, 70, 324,566,863,1013,1112,1441,1527,1549,1732),
  end = c(69, 323, 565,862,1012,1111,1440,1526,1548,1731,2193), 
  viralprotein = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))

#Setting colors for different enrich2 classes
fitness_class_colors <- c( "#202E38","#7B9CAF","#9DCCED")
fitness_class_colors_del <- c( "#441E1E","#BF7B7B","#FFD9D9")
fitness_class_colors_dms <- c( "#285e1b","#529143","#a6f492")

#Dataframe for insertional handle for comparison of enrich2 classes
Handle_P2_Fitness_Class <- Scores_Insertional_Handle_Fullproteome %>% mutate(viralprotein = sapply(indel, function(pos) {
matching_row <- which(pos >= Viral_protein_Mapping$start & pos <= Viral_protein_Mapping$end)
if (length(matching_row) > 0) {return(Viral_protein_Mapping$viralprotein[matching_row])} else {return("No Protein")}})
)

#Adding scores classes depending on the enrich2 scores and plotting the area plot
Handle_P2_Fitness_Class  <- na.omit(Handle_P2_Fitness_Class)
Handle_P2_Fitness_Class$Class="N"
Handle_P2_Fitness_Class[Handle_P2_Fitness_Class$score<2,]$Class="N"
Handle_P2_Fitness_Class[Handle_P2_Fitness_Class$score<0,]$Class="L"
Handle_P2_Fitness_Class[Handle_P2_Fitness_Class$score>2,]$Class="B"
Handle_P2_Fitness_Class_counts <- table(Handle_P2_Fitness_Class$Class,Handle_P2_Fitness_Class$viralprotein)
chisq.test(Handle_P2_Fitness_Class_counts)
Handle_P2_Fitness_Class$viralprotein <- factor(Handle_P2_Fitness_Class$viralprotein, levels = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))  
Handle_P2_Fitness_Class$Class <- factor(Handle_P2_Fitness_Class$Class, levels = c("B", "N","L"))  
plot(as.factor(Handle_P2_Fitness_Class$viralprotein),as.factor(Handle_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors,lwd = 0.6)
pdf("Insertion_Fitness_Class.pdf",width = 4, height = 4)
plot(as.factor(Handle_P2_Fitness_Class$viralprotein),as.factor(Handle_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors,lwd = 0.6)
dev.off()

#Dataframe for deletion library for comparison of enrich2 classes
Deletion3bp_P2_Fitness_Class <- Scores_Deletions1AA_Fullproteome_1AA %>% mutate(viralprotein = sapply(indel, function(pos) {
matching_row <- which(pos >= Viral_protein_Mapping$start & pos <= Viral_protein_Mapping$end)
if (length(matching_row) > 0) {return(Viral_protein_Mapping$viralprotein[matching_row])} else {return("No Protein")}})
)

#Adding scores classes depending on the enrich2 scores and plotting the area plot
Deletion3bp_P2_Fitness_Class  <- na.omit(Deletion3bp_P2_Fitness_Class)
Deletion3bp_P2_Fitness_Class$Class="N"
Deletion3bp_P2_Fitness_Class[Deletion3bp_P2_Fitness_Class$score<2,]$Class="N"
Deletion3bp_P2_Fitness_Class[Deletion3bp_P2_Fitness_Class$score<0,]$Class="L"
Deletion3bp_P2_Fitness_Class[Deletion3bp_P2_Fitness_Class$score>2,]$Class="B"
Deletion3bp_P2_Fitness_Class_Counts <- table(Deletion3bp_P2_Fitness_Class$Class,Deletion3bp_P2_Fitness_Class$viralprotein)
chisq.test(Deletion3bp_P2_Fitness_Class_Counts)
Deletion3bp_P2_Fitness_Class$viralprotein <- factor(Deletion3bp_P2_Fitness_Class$viralprotein, levels = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))  
Deletion3bp_P2_Fitness_Class$Class <- factor(Deletion3bp_P2_Fitness_Class$Class, levels = c("B", "N","L"))  
plot(as.factor(Deletion3bp_P2_Fitness_Class$viralprotein),as.factor(Deletion3bp_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors_del,lwd = 0.6)
pdf("Deletion_Fitness_Class_3bp.pdf",width = 4, height = 4)
plot(as.factor(Deletion3bp_P2_Fitness_Class$viralprotein),as.factor(Deletion3bp_P2_Fitness_Class$Class),ylim=c(0.8,1.0),col = fitness_class_colors_del,lwd = 0.6)
dev.off()

#Dataframe for DMS  for comparison of enrich2 classes
DMS_P2_Fitness_Class <- Fullproteome_P2_DMS_Enrich2_long %>% mutate(viralprotein = sapply(position, function(pos) {
matching_row <- which(pos >= Viral_protein_Mapping$start & pos <= Viral_protein_Mapping$end)
if (length(matching_row) > 0) {return(Viral_protein_Mapping$viralprotein[matching_row])} else {return("No Protein")}})
)

#Adding scores classes depending on the enrich2 scores and plotting the area plot
DMS_P2_Fitness_Class  <- na.omit(DMS_P2_Fitness_Class)
DMS_P2_Fitness_Class$Class="N"
DMS_P2_Fitness_Class[DMS_P2_Fitness_Class$score<2,]$Class="N"
DMS_P2_Fitness_Class[DMS_P2_Fitness_Class$score<0,]$Class="L"
DMS_P2_Fitness_Class[DMS_P2_Fitness_Class$score>2,]$Class="B"
DMS_P2_Fitness_Class_Counts <- table(DMS_P2_Fitness_Class$Class,DMS_P2_Fitness_Class$viralprotein)
chisq.test(DMS_P2_Fitness_Class_Counts)
DMS_P2_Fitness_Class$viralprotein <- factor(DMS_P2_Fitness_Class$viralprotein, levels = c("VP4", "VP2", "VP3","VP1","2A","2B","2C","3A","3B","3C","3D"))  
DMS_P2_Fitness_Class$Class <- factor(DMS_P2_Fitness_Class$Class, levels = c("B", "N","L"))  
plot(as.factor(DMS_P2_Fitness_Class$viralprotein),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
pdf("DMS_P2_Fitness_Class.pdf",width = 4, height = 4)
plot(as.factor(DMS_P2_Fitness_Class$viralprotein),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
dev.off()
Warning message in chisq.test(Handle_P2_Fitness_Class_counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  Handle_P2_Fitness_Class_counts
X-squared = 140.43, df = 20, p-value < 2.2e-16
png: 2
Warning message in chisq.test(Deletion3bp_P2_Fitness_Class_Counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  Deletion3bp_P2_Fitness_Class_Counts
X-squared = 84.685, df = 20, p-value = 6.204e-10
No description has been provided for this image
png: 2
	Pearson's Chi-squared test

data:  DMS_P2_Fitness_Class_Counts
X-squared = 1793.8, df = 20, p-value < 2.2e-16
No description has been provided for this image
png: 2
No description has been provided for this image
In [104]:
#Supplementary Figure 6.  Enrich2 scores for insertions, deletions, and amino acid changes across the EV-A71 proteome

#Making insertional df compatible for merge
Scores_Insertional_Handle_Fullproteome_formerge <-  Scores_Insertional_Handle_Fullproteome %>% mutate(AminoAcid = "Ins")
Scores_Insertional_Handle_Fullproteome_formerge  <- Scores_Insertional_Handle_Fullproteome_formerge %>% rename(position = colnames(Scores_Insertional_Handle_Fullproteome_formerge)[1])
#Making deletional library df compatible for merge
Scores_Deletions_Fullproteome_formerge  <- Scores_Deletions_Fullproteome %>% rename(AminoAcid = colnames(Scores_Deletions_Fullproteome)[3])
Scores_Deletions_Fullproteome_formerge  <- Scores_Deletions_Fullproteome_formerge %>% rename(position = colnames(Scores_Deletions_Fullproteome_formerge)[1])
#Merging all dataframes
merged_df_indel_DMS <- merge(Scores_Insertional_Handle_Fullproteome_formerge, Scores_Deletions_Fullproteome_formerge, by = c("position","AminoAcid","score"),all = TRUE)
merged_df_indel_DMS <- merge(merged_df_indel_DMS, Fullproteome_P2_DMS_Enrich2_long, by = c("position","AminoAcid","score"),all = TRUE)
merged_df_indel_DMS$AminoAcid=factor(merged_df_indel_DMS$AminoAcid,levels = c("3AAdel","2AAdel","1AAdel","Ins","P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
write.table(merged_df_indel_DMS,file="merged_df_indel_DMS.csv",row.names=FALSE,sep=",")
# Convert Position to a factor with breaks
breaks <- seq(0, 2193, by = 250)

# Creating a function to assign labels based on position
get_label <- function(position) {
  sub_labels <- c("A", "B", "C", "D", "E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V")
  sub_index <- findInterval(position, breaks)
  return(sub_labels[sub_index])
}

merged_df_indel_DMS$sub <- sapply(merged_df_indel_DMS$position, get_label)

#Plotting the heatmap of the complete proteome
DMS_indel_Heatmap_fullproteome <- ggplot(merged_df_indel_DMS) +
    geom_tile(aes(position, AminoAcid, fill = score)) +
    scale_fill_gradient2(midpoint = 0,limits = c(-8,8),low = muted("orchid2"), mid = "white",
                         high = muted("aquamarine1"),space = "Lab",na.value = "black") + theme_classic() +
                        facet_wrap(sub ~ ., scales = "free_x",ncol=1) + theme(aspect.ratio = 0.1,axis.text.y = element_text(size = 3))+ 
                        theme(strip.text.x = element_blank())

#Save figure
ggsave(plot = DMS_indel_Heatmap_fullproteome,filename = "DMS_indel_Heatmap_fullproteome.pdf",width = 8,height=11)

DMS_indel_Heatmap_fullproteome
No description has been provided for this image
In [105]:
#Figure 3. Impact of altering insertion sequence, deletion length, and amino acid residue substitution on EV-A71 growth

#Creating a factor for Amino acid changes and plotting area plot for different amino acids
DMS_P2_Fitness_Class$AminoAcid=factor(DMS_P2_Fitness_Class$AminoAcid,levels = c("P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
DMS_P2_Fitness_Class_Counts_AA <- table(DMS_P2_Fitness_Class$Class,DMS_P2_Fitness_Class$AminoAcid)
chisq.test(DMS_P2_Fitness_Class_Counts_AA)
plot(as.factor(DMS_P2_Fitness_Class$AminoAcid),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
pdf("DMS_P2_Fitness_Class_Aminoacids.pdf",width = 4, height = 4)
plot(as.factor(DMS_P2_Fitness_Class$AminoAcid),as.factor(DMS_P2_Fitness_Class$Class),ylim=c(0.5,1.0),col = fitness_class_colors_dms,lwd = 0.6)
dev.off()

#Area plot for deletions of different lengths
Deletion_P2_Fitness_Class <- Scores_Deletions_Fullproteome
Deletion_P2_Fitness_Class  <- na.omit(Deletion_P2_Fitness_Class)
Deletion_P2_Fitness_Class$Class="N"
Deletion_P2_Fitness_Class[Deletion_P2_Fitness_Class$score<2,]$Class="N"
Deletion_P2_Fitness_Class[Deletion_P2_Fitness_Class$score<0,]$Class="L"
Deletion_P2_Fitness_Class[Deletion_P2_Fitness_Class$score>2,]$Class="B"
Deletion_P2_Fitness_Class_Counts <- table(Deletion_P2_Fitness_Class$Class,Deletion_P2_Fitness_Class$dataset)
chisq.test(Deletion_P2_Fitness_Class_Counts)
Deletion_P2_Fitness_Class$Class <- factor(Deletion_P2_Fitness_Class$Class, levels = c("B", "N","L"))  
plot(as.factor(Deletion_P2_Fitness_Class$dataset),as.factor(Deletion_P2_Fitness_Class$Class),ylim=c(0.95,1.0),col = fitness_class_colors_del,lwd = 0.6)
pdf("Deletion_Fitness_Class_3_6_9_bp.pdf",width = 4, height = 4)
plot(as.factor(Deletion_P2_Fitness_Class$dataset),as.factor(Deletion_P2_Fitness_Class$Class),ylim=c(0.95,1.0),col = fitness_class_colors_del,lwd = 0.6)
dev.off()

#Creating a factor for the 5 AA insertion library using inserted amino acid as factor 
Fullproteome_1AA_Enrich2_long$AminoAcid=factor(Fullproteome_1AA_Enrich2_long$AminoAcid,levels = c("P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
Fullproteome_1AA_Enrich2_long$sub <- sapply(Fullproteome_1AA_Enrich2_long$insertion, get_label)

#Heat map function for the 5 AA insertion library
create_AA_heatmap <- function(lower_bound, upper_bound) {
  df <- filter(Fullproteome_1AA_Enrich2_long, insertion > lower_bound, insertion < upper_bound)
  
  SingleAA_Heatmap <- ggplot(df) +
    geom_tile(aes(insertion, AminoAcid, fill = score)) +
    scale_fill_gradient2(midpoint = 0,limits = c(-8,8),
                         low = muted("orchid2"),
                         mid = "white",
                         high = muted("aquamarine1"),
                         space = "Lab",
                         na.value = "black") +
    theme_classic() +coord_fixed()
  
  filename <- paste0("SingleAA_Heatmap_", lower_bound, "_to_", upper_bound, ".pdf")
  ggsave(filename, height = 6, width = 6)

  return(SingleAA_Heatmap)
}

#Heatmaps for the sites at N-termini of VP1/2A/3A and VP3
create_AA_heatmap(lower_bound = 399, upper_bound = 451)
create_AA_heatmap(lower_bound = 549, upper_bound = 601)
create_AA_heatmap(lower_bound = 849, upper_bound = 901)
create_AA_heatmap(lower_bound = 1440, upper_bound = 1492)

# Area plot for 5 AA insertion library
SingleAA_fullproteome_Fitness_Class <- Fullproteome_1AA_Enrich2_long
SingleAA_fullproteome_Fitness_Class  <- na.omit(SingleAA_fullproteome_Fitness_Class)
SingleAA_fullproteome_Fitness_Class$Class="N"
SingleAA_fullproteome_Fitness_Class[SingleAA_fullproteome_Fitness_Class$score<2,]$Class="N"
SingleAA_fullproteome_Fitness_Class[SingleAA_fullproteome_Fitness_Class$score<0,]$Class="L"
SingleAA_fullproteome_Fitness_Class[SingleAA_fullproteome_Fitness_Class$score>2,]$Class="B"
SingleAA_fullproteome_Fitness_Class_Counts <- table(SingleAA_fullproteome_Fitness_Class$Class,SingleAA_fullproteome_Fitness_Class$AminoAcid)
chisq.test(SingleAA_fullproteome_Fitness_Class_Counts)
SingleAA_fullproteome_Fitness_Class$Class <- factor(SingleAA_fullproteome_Fitness_Class$Class, levels = c("B", "N","L"))  
plot(as.factor(SingleAA_fullproteome_Fitness_Class$AminoAcid),as.factor(SingleAA_fullproteome_Fitness_Class$Class),ylim=c(0.97,1),col = fitness_class_colors,lwd = 0.6)
pdf("SingleAA_fullproteome_Fitness_Class.pdf",width = 4, height = 4)
plot(as.factor(SingleAA_fullproteome_Fitness_Class$AminoAcid),as.factor(SingleAA_fullproteome_Fitness_Class$Class),ylim=c(0.97,1),col = fitness_class_colors,lwd = 0.6)
dev.off()

#Creating a dataframe with only deletions and then creating bar plots for different deletion lengths at InDel hotspots
merged_df_del_only <- filter(merged_df_indel_DMS,AminoAcid %in% c("1AAdel", "2AAdel","3AAdel"))
merged_df_del_only <- select(merged_df_del_only,position,AminoAcid,score)
merged_df_del_only$AminoAcid=factor(merged_df_del_only$AminoAcid,levels = c("1AAdel", "2AAdel","3AAdel"))

#Bar plot at VP1 N-terminus cleavage site  
Deletion_N_term_VP1  <- filter(merged_df_del_only,position>549,position<601)
vp1_nterminus_del <- ggplot(Deletion_N_term_VP1)+
  geom_bar(aes(factor(position),score,fill=AminoAcid),stat = 'identity',position="dodge")+theme_classic()+scale_fill_manual(values =c("#8E8E8E", "#686868", "#444343"))+
  theme(axis.text.x= element_text(angle=90))+ ylab("Enrich2 score") + xlab("Amino Acid Position")+facet_grid(AminoAcid~.)+theme(legend.position = "none")+coord_fixed(ylim=c(-4,4))
vp1_nterminus_del
#Save figure
ggsave(plot = vp1_nterminus_del,filename = "vp1_nterminus_del_hotspot2.pdf",width = 7,height=4)

#Bar plot at VP1-2A cleavage site  
Deletion_Cleavage_VP1  <- filter(merged_df_del_only,position>849,position<901)
vp1_2a_cleavage <- ggplot(Deletion_Cleavage_VP1)+
  geom_bar(aes(factor(position),score,fill=AminoAcid),stat = 'identity',position="dodge")+theme_classic()+scale_fill_manual(values =c("#8E8E8E", "#686868", "#444343"))+
  theme(axis.text.x= element_text(angle=90))+ ylab("Relative Fitness") + xlab("Amino Acid Position")+facet_grid(AminoAcid~.)+theme(legend.position = "none")+coord_fixed(ylim=c(-4,4))
vp1_2a_cleavage
#Save figure
ggsave(plot = vp1_2a_cleavage,filename = "vp1_2a_cleavage_hotspot3.pdf",width = 7,height=4)

#Bar plot at 3A N-terminus 
Deletion_3A_N_term  <- filter(merged_df_del_only,position>1440,position<1492)
del3Adel <- ggplot(Deletion_3A_N_term)+
  geom_bar(aes(factor(position),score,fill=AminoAcid),stat = 'identity',position="dodge")+theme_classic()+scale_fill_manual(values =c("#8E8E8E", "#686868", "#444343"))+
  theme(axis.text.x= element_text(angle=90))+ ylab("Relative Fitness") + xlab("Amino Acid Position")+facet_grid(AminoAcid~.)+theme(legend.position = "none")+coord_fixed(ylim=c(-4,4))
del3Adel
#Save figure
ggsave(plot = del3Adel,filename = "del3Adel_hotspot4.pdf",width = 7,height=4)
	Pearson's Chi-squared test

data:  DMS_P2_Fitness_Class_Counts_AA
X-squared = 991.35, df = 38, p-value < 2.2e-16
png: 2
	Pearson's Chi-squared test

data:  Deletion_P2_Fitness_Class_Counts
X-squared = 63.252, df = 4, p-value = 6.007e-13
No description has been provided for this image
png: 2
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
	Pearson's Chi-squared test

data:  SingleAA_fullproteome_Fitness_Class_Counts
X-squared = 108.69, df = 38, p-value = 9.873e-09
No description has been provided for this image
png: 2
No description has been provided for this image
Warning message:
“Removed 2 rows containing missing values (`geom_bar()`).”
No description has been provided for this image
Warning message:
“Removed 2 rows containing missing values (`geom_bar()`).”
No description has been provided for this image
No description has been provided for this image
In [106]:
#Figure 4. Structural interpretation of InDel fitness effects for 2A(pro) and 3D(pol) 
#Supplementary Figure 8. Structural interpretation of mutational effects for 3A, 2C, and 3C(pro) 

#Parsing dataframe for exponentiated score as an attribute for chimera for the 2A(pro): Insertional Handle
Insertions_2A <- filter(Scores_Insertional_Handle_Fullproteome,indel>862,indel<1013)
Insertions_2A <- na.omit(Insertions_2A)
Insertions_2A_reindexed <- Insertions_2A %>% mutate(indel=indel-862)
Insertions_2A_reindexed$indel <- paste0(Insertions_2A_reindexed$indel, ".")
Insertions_2A_reindexed$indel <- paste0(Insertions_2A_reindexed$indel, "A")
Insertions_2A_reindexed$indel <- paste(":", Insertions_2A_reindexed$indel, sep = "")
write.csv(Insertions_2A_reindexed,file ="2A_Chimera_Attached_Insertions.csv",)
In [107]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the 2A(pro): Deletion 1 AA 
Deletions_2A <- filter(Scores_Deletions1AA_Fullproteome_1AA,indel>862,indel<1013)
Deletions_2A <- na.omit(Deletions_2A)
Deletions_2A_reindexed <- Deletions_2A %>% mutate(indel=indel-862)
Deletions_2A_reindexed$indel <- paste0(Deletions_2A_reindexed$indel, ".")
Deletions_2A_reindexed$indel <- paste0(Deletions_2A_reindexed$indel, "A")
Deletions_2A_reindexed$indel <- paste(":", Deletions_2A_reindexed$indel, sep = "")
write.csv(Deletions_2A_reindexed,file ="2A_Chimera_Attached_Deletions.csv",)
In [108]:
#Parsing dataframe for mean exponentiated score as an attribute for chimera for the 2A(pro): DMS
DMS_2A <- filter(Fullproteome_P2_DMS_Enrich2_long,position>862,position<1013)
DMS_2A <-  DMS_2A %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
DMS_2A <- DMS_2A %>% replace(is.na(.), "")
DMS_2A_reindexed <- DMS_2A %>% mutate(position=position-862)
DMS_2A_reindexed$position <- paste0(DMS_2A_reindexed$position, ".")
DMS_2A_reindexed$position <- paste0(DMS_2A_reindexed$position, "A")
DMS_2A_reindexed$position <- paste(":", DMS_2A_reindexed$position, sep = "")
write.csv(DMS_2A_reindexed,file ="DMS_2A_reindexed_chimera.csv",)
DMS_2A_reindexed
A tibble: 150 × 2
positionscore
<chr><dbl>
:1.A -3.2583277
:2.A 1.6086559
:3.A 2.2672482
:4.A 1.6855832
:5.A 1.5026366
:6.A 1.2624536
:7.A 1.5661049
:8.A 0.6479698
:9.A -1.3420372
:10.A-1.9844672
:11.A 0.3477806
:12.A-1.1440086
:13.A-1.1640856
:14.A-0.3041383
:15.A-1.8627696
:16.A-2.5558196
:17.A-1.9734187
:18.A-1.6247600
:19.A-1.2573557
:20.A-1.6619071
:21.A-2.6747915
:22.A-1.0767590
:23.A-2.3566803
:24.A-0.1215330
:25.A 1.9866321
:26.A 1.6037350
:27.A-2.6100113
:28.A-1.4032755
:29.A 1.4954758
:30.A-0.3345598
⋮⋮
:121.A-1.8821218
:122.A-3.3446922
:123.A-2.2803981
:124.A-2.1335697
:125.A-2.7440641
:126.A-1.6002241
:127.A-3.0068319
:128.A-1.3730460
:129.A 1.2482721
:130.A-1.0821599
:131.A 0.6388990
:132.A-2.0803571
:133.A-2.7253275
:134.A-3.0507907
:135.A-1.8450757
:136.A-1.9163833
:137.A-2.1605476
:138.A-2.8020985
:139.A-2.5567497
:140.A-2.1396983
:141.A 0.9739174
:142.A-2.6103306
:143.A-2.4713307
:144.A-2.3159851
:145.A-1.6662514
:146.A-1.0447044
:147.A-1.9941271
:148.A-1.2372224
:149.A-0.7805746
:150.A-3.2530176
In [109]:
#STRIDE of 2A(pro)
STRIDEP71_2A <- readr::read_tsv(file.path("STRIDE_3W95.txt"),col_names = TRUE)
STRIDEP71_2A_DMS <- STRIDEP71_2A %>% rename(position = colnames(STRIDEP71_2A)[1])

#Areaplot enrich2 score class and secondary structures for insertions
NS2A_2ndary_insertion <- filter(Scores_Insertional_Handle_Fullproteome,indel>862,indel<1013)
NS2A_2ndary_insertion <- cbind(NS2A_2ndary_insertion, STRIDEP71_2A$Secondary)
colnames(NS2A_2ndary_insertion)[colnames(NS2A_2ndary_insertion) == "STRIDEP71_2A$Secondary"] <- "Secondary"
NS2A_2ndary_insertion <- NS2A_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2A_2ndary_insertion <- NS2A_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2A_2ndary_insertion_fitness_class  <- na.omit(NS2A_2ndary_insertion)
NS2A_2ndary_insertion_fitness_class$Class="N"
NS2A_2ndary_insertion_fitness_class[NS2A_2ndary_insertion_fitness_class$score<2,]$Class="N"
NS2A_2ndary_insertion_fitness_class[NS2A_2ndary_insertion_fitness_class$score<0,]$Class="L"
NS2A_2ndary_insertion_fitness_class[NS2A_2ndary_insertion_fitness_class$score>2,]$Class="B"
NS2A_2ndary_insertion_fitness_class_Counts <- table(NS2A_2ndary_insertion_fitness_class$Class,NS2A_2ndary_insertion_fitness_class$Secondary)
chisq.test(NS2A_2ndary_insertion_fitness_class_Counts)
NS2A_2ndary_insertion_fitness_class$Class <- factor(NS2A_2ndary_insertion_fitness_class$Class, levels = c("B","N","L"))  
plot(as.factor(NS2A_2ndary_insertion_fitness_class$Secondary),as.factor(NS2A_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
pdf("insertion_fitness_class_2ndary_2A.pdf",width = 4, height = 4)
plot(as.factor(NS2A_2ndary_insertion_fitness_class$Secondary),as.factor(NS2A_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()

#Areaplot enrich2 score class and secondary structures for deletions
NS2A_2ndary_del <- Scores_Deletions1AA_Fullproteome_1AA
NS2A_2ndary_del <- NS2A_2ndary_del %>% ungroup() 
NS2A_2ndary_del <- filter(NS2A_2ndary_del,indel>862,indel<1013)
NS2A_2ndary_del <- cbind(NS2A_2ndary_del, STRIDEP71_2A$Secondary)
colnames(NS2A_2ndary_del)[colnames(NS2A_2ndary_del) == "STRIDEP71_2A$Secondary"] <- "Secondary"
NS2A_2ndary_del <- NS2A_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2A_2ndary_del <- NS2A_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2A_2ndary_del_fitness_class  <- na.omit(NS2A_2ndary_del)
NS2A_2ndary_del_fitness_class$Class="N"
NS2A_2ndary_del_fitness_class[NS2A_2ndary_del_fitness_class$score<2,]$Class="N"
NS2A_2ndary_del_fitness_class[NS2A_2ndary_del_fitness_class$score<0,]$Class="L"
NS2A_2ndary_del_fitness_class[NS2A_2ndary_del_fitness_class$score>2,]$Class="B"
NS2A_2ndary_del_fitness_class_Counts <- table(NS2A_2ndary_del_fitness_class$Class,NS2A_2ndary_del_fitness_class$Secondary)
chisq.test(NS2A_2ndary_del_fitness_class_Counts)
NS2A_2ndary_del_fitness_class$Class <- factor(NS2A_2ndary_del_fitness_class$Class, levels = c("B", "N","L"))  
plot(as.factor(NS2A_2ndary_del_fitness_class$Secondary),as.factor(NS2A_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
pdf("deletions_fitness_class_2ndary_2A.pdf",width = 4, height = 4)
plot(as.factor(NS2A_2ndary_del_fitness_class$Secondary),as.factor(NS2A_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()

#Areaplot enrich2 score class and secondary structures for DMS
NS2A_2ndary_DMS <- Fullproteome_P2_DMS_Enrich2_long
NS2A_2ndary_DMS <- filter(NS2A_2ndary_DMS,position>862,position<1013)
NS2A_2ndary_DMS <- select(NS2A_2ndary_DMS,position,score,AminoAcid)
NS2A_2ndary_DMS <- mutate(NS2A_2ndary_DMS,position=position-862)
NS2A_2ndary_DMS <- merge(NS2A_2ndary_DMS,STRIDEP71_2A_DMS,by="position")
NS2A_2ndary_DMS <- NS2A_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2A_2ndary_DMS <- NS2A_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2A_2ndary_DMS_fitness_class  <- na.omit(NS2A_2ndary_DMS)
NS2A_2ndary_DMS_fitness_class$Class="N"
NS2A_2ndary_DMS_fitness_class[NS2A_2ndary_DMS_fitness_class$score<2,]$Class="N"
NS2A_2ndary_DMS_fitness_class[NS2A_2ndary_DMS_fitness_class$score<0,]$Class="L"
NS2A_2ndary_DMS_fitness_class[NS2A_2ndary_DMS_fitness_class$score>2,]$Class="B"
NS2A_2ndary_DMS_fitness_class_Counts <- table(NS2A_2ndary_DMS_fitness_class$Class,NS2A_2ndary_DMS_fitness_class$Secondary)
chisq.test(NS2A_2ndary_DMS_fitness_class_Counts)
NS2A_2ndary_DMS_fitness_class$Class <- factor(NS2A_2ndary_DMS_fitness_class$Class, levels = c("B", "N", "L"))  
plot(as.factor(NS2A_2ndary_DMS_fitness_class$Secondary),as.factor(NS2A_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
pdf("DMS_fitness_class_2ndary_2A.pdf",width = 4, height = 4)
plot(as.factor(NS2A_2ndary_DMS_fitness_class$Secondary),as.factor(NS2A_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
Rows: 150 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Secondary
dbl (1): Residue

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message in chisq.test(NS2A_2ndary_insertion_fitness_class_Counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  NS2A_2ndary_insertion_fitness_class_Counts
X-squared = 11.99, df = 4, p-value = 0.01742
png: 2
Warning message in chisq.test(NS2A_2ndary_del_fitness_class_Counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  NS2A_2ndary_del_fitness_class_Counts
X-squared = 5.3477, df = 4, p-value = 0.2534
No description has been provided for this image
png: 2
	Pearson's Chi-squared test

data:  NS2A_2ndary_DMS_fitness_class_Counts
X-squared = 63.698, df = 4, p-value = 4.839e-13
No description has been provided for this image
png: 2
No description has been provided for this image
In [110]:
#Function to create areaplots for different amino acids and tolerance at secondary structures in 2A
r2A_mutationtype_function <- function(dfi,AA,plotname) {
df  <- na.omit(dfi)
df  <- filter(df,AminoAcid==AA)
df$Class="N"
df[df$score<2,]$Class="N"
df[df$score<0,]$Class="L"
df[df$score>2,]$Class="B"
df$Class <- factor(df$Class, levels = c("B", "N", "L")) 
df_Counts <- table(df$Class,df$Secondary)
df_Counts
plot(as.factor(df$Secondary),as.factor(df$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
pdf(plotname,width = 4, height = 4)
plot(as.factor(df$Secondary),as.factor(df$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
    return(df_Counts)
}

#Low alpha helix propensity
r2A_mutationtype_function(NS2A_2ndary_DMS,"G","Glycine_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"P","Proline_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"Y","tyrosine_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"I","IsoLeucine_DMS_2A.pdf")

#High alpha helix propensity
r2A_mutationtype_function(NS2A_2ndary_DMS,"A","Alanine_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"M","met_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"E","Glu_DMS_2A.pdf")
r2A_mutationtype_function(NS2A_2ndary_DMS,"L","Leucine_DMS_2A.pdf")
   
     E  H  L
  B  3  1  5
  N 16  6 21
  L 47  8 23
No description has been provided for this image
   
     E  H  L
  B  0  1  5
  N 11  4 10
  L 63 10 38
No description has been provided for this image
   
     E  H  L
  B  6  0  1
  N 15 11 17
  L 49  4 36
No description has been provided for this image
   
     E  H  L
  B  6  0  3
  N 27  7 13
  L 38  8 39
No description has been provided for this image
   
     E  H  L
  B  7  6  7
  N 29  3 19
  L 35  5 26
No description has been provided for this image
   
     E  H  L
  B  7  2  2
  N 30  6 26
  L 37  6 28
No description has been provided for this image
   
     E  H  L
  B  3  3  3
  N 17  5 21
  L 54  7 28
No description has been provided for this image
   
     E  H  L
  B  1  3  1
  N 24  4 21
  L 42  7 31
No description has been provided for this image
In [111]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 2C
parse2C_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {

dataframeindexed <- filter(dataframeoriginal,indel>1111,indel<1441)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1111)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "A")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)

}

#Insertion
parse2C_function(Scores_Insertional_Handle_Fullproteome,Insertions_2C_reindexed,"Insertions_2C_reindexed.csv")

#Deletion
parse2C_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_2C_reindexed,"Deletions_2C_reindexed.csv")

#DMS
Score_DMS_mean_2C <-  Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_2C <- Score_DMS_mean_2C %>% rename(indel = colnames(Score_DMS_mean_2C)[1])
parse2C_function(Score_DMS_mean_2C,DMS_2C_reindexed,"DMS_2C_reindexed.csv")
In [112]:
#STRIDE of 2C
STRIDEP71_2C <- readr::read_tsv(file.path("STRIDE_5GQ1.txt"),col_names = TRUE)
STRIDEP71_2C <- STRIDEP71_2C %>% complete(Residue = seq(1,329,1))
STRIDEP71_2C_DMS <- STRIDEP71_2C %>% rename(position = colnames(STRIDEP71_2C)[1])

#Areaplot enrich2 score class and secondary structures for insertions
NS2C_2ndary_insertion <- filter(Scores_Insertional_Handle_Fullproteome,indel>1111,indel<1441)
NS2C_2ndary_insertion <- cbind(NS2C_2ndary_insertion, STRIDEP71_2C$Secondary)
colnames(NS2C_2ndary_insertion)[colnames(NS2C_2ndary_insertion) == "STRIDEP71_2C$Secondary"] <- "Secondary"
NS2C_2ndary_insertion <- NS2C_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2C_2ndary_insertion <- NS2C_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2C_2ndary_insertion_fitness_class  <- na.omit(NS2C_2ndary_insertion)
NS2C_2ndary_insertion_fitness_class$Class="N"
NS2C_2ndary_insertion_fitness_class[NS2C_2ndary_insertion_fitness_class$score<2,]$Class="N"
NS2C_2ndary_insertion_fitness_class[NS2C_2ndary_insertion_fitness_class$score<0,]$Class="L"
#NS2C_2ndary_insertion_fitness_class[NS2C_2ndary_insertion_fitness_class$score>2,]$Class="B"
NS2C_2ndary_insertion_fitness_class_Counts <- table(NS2C_2ndary_insertion_fitness_class$Class,NS2C_2ndary_insertion_fitness_class$Secondary)
chisq.test(NS2C_2ndary_insertion_fitness_class_Counts)
NS2C_2ndary_insertion_fitness_class$Class <- factor(NS2C_2ndary_insertion_fitness_class$Class, levels = c("N","L"))  
plot(as.factor(NS2C_2ndary_insertion_fitness_class$Secondary),as.factor(NS2C_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.8,1.0),lwd = 0.5)
pdf("insertion_fitness_class_2ndary_2C.pdf",width = 4, height = 4)
plot(as.factor(NS2C_2ndary_insertion_fitness_class$Secondary),as.factor(NS2C_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.8,1.0),lwd = 0.5)
dev.off()

#Areaplot enrich2 score class and secondary structures for deletions
NS2C_2ndary_del <- Scores_Deletions1AA_Fullproteome_1AA
NS2C_2ndary_del <- NS2C_2ndary_del %>% ungroup() 
NS2C_2ndary_del <- filter(NS2C_2ndary_del,indel>1111,indel<1441)
NS2C_2ndary_del <- cbind(NS2C_2ndary_del, STRIDEP71_2C$Secondary)
colnames(NS2C_2ndary_del)[colnames(NS2C_2ndary_del) == "STRIDEP71_2C$Secondary"] <- "Secondary"
NS2C_2ndary_del <- NS2C_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2C_2ndary_del <- NS2C_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS2C_2ndary_del_fitness_class  <- na.omit(NS2C_2ndary_del)
NS2C_2ndary_del_fitness_class$Class="N"
NS2C_2ndary_del_fitness_class[NS2C_2ndary_del_fitness_class$score<2,]$Class="N"
NS2C_2ndary_del_fitness_class[NS2C_2ndary_del_fitness_class$score<0,]$Class="L"
NS2C_2ndary_del_fitness_class[NS2C_2ndary_del_fitness_class$score>2,]$Class="B"
NS2C_2ndary_del_fitness_class_Counts <- table(NS2C_2ndary_del_fitness_class$Class,NS2C_2ndary_del_fitness_class$Secondary)
chisq.test(NS2C_2ndary_del_fitness_class_Counts)
NS2C_2ndary_del_fitness_class$Class <- factor(NS2C_2ndary_del_fitness_class$Class, levels = c("B", "N","L"))  
plot(as.factor(NS2C_2ndary_del_fitness_class$Secondary),as.factor(NS2C_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.8,1.0),lwd = 0.5)
pdf("deletions_fitness_class_2ndary_2C.pdf",width = 4, height = 4)
plot(as.factor(NS2C_2ndary_del_fitness_class$Secondary),as.factor(NS2C_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.8,1.0),lwd = 0.5)
dev.off()

#Areaplot enrich2 score class and secondary structures for DMS
NS2C_2ndary_DMS <- Fullproteome_P2_DMS_Enrich2_long
NS2C_2ndary_DMS <- filter(NS2C_2ndary_DMS,position>1111,position<1441)
NS2C_2ndary_DMS <- select(NS2C_2ndary_DMS,position,score)
NS2C_2ndary_DMS <- mutate(NS2C_2ndary_DMS,position=position-1111)
NS2C_2ndary_DMS <- merge(NS2C_2ndary_DMS,STRIDEP71_2C_DMS,by="position")
NS2C_2ndary_DMS <- NS2C_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS2C_2ndary_DMS <- NS2C_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
write.csv(NS2C_2ndary_DMS,file ="NS2C_2ndary_DMS.csv",)
NS2C_2ndary_DMS_fitness_class  <- na.omit(NS2C_2ndary_DMS)
NS2C_2ndary_DMS_fitness_class$Class="N"
NS2C_2ndary_DMS_fitness_class[NS2C_2ndary_DMS_fitness_class$score<2,]$Class="N"
NS2C_2ndary_DMS_fitness_class[NS2C_2ndary_DMS_fitness_class$score<0,]$Class="L"
NS2C_2ndary_DMS_fitness_class[NS2C_2ndary_DMS_fitness_class$score>2,]$Class="B"
NS2C_2ndary_DMS_fitness_class_Counts <- table(NS2C_2ndary_DMS_fitness_class$Class,NS2C_2ndary_DMS_fitness_class$Secondary)
chisq.test(NS2C_2ndary_DMS_fitness_class_Counts)
NS2C_2ndary_DMS_fitness_class$Class <- factor(NS2C_2ndary_DMS_fitness_class$Class, levels = c("B", "N", "L"))  
plot(as.factor(NS2C_2ndary_DMS_fitness_class$Secondary),as.factor(NS2C_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.8,1.0),lwd = 0.5)
pdf("DMS_fitness_class_2ndary_2C.pdf",width = 4, height = 4)
plot(as.factor(NS2C_2ndary_DMS_fitness_class$Secondary),as.factor(NS2C_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.8,1.0),lwd = 0.5)
dev.off()
Rows: 207 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Secondary
dbl (1): Residue

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
	Chi-squared test for given probabilities

data:  NS2C_2ndary_insertion_fitness_class_Counts
X-squared = 30.464, df = 2, p-value = 2.426e-07
png: 2
Warning message in chisq.test(NS2C_2ndary_del_fitness_class_Counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  NS2C_2ndary_del_fitness_class_Counts
X-squared = 0.98555, df = 2, p-value = 0.6109
No description has been provided for this image
png: 2
Warning message in chisq.test(NS2C_2ndary_DMS_fitness_class_Counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  NS2C_2ndary_DMS_fitness_class_Counts
X-squared = 16.477, df = 4, p-value = 0.002442
No description has been provided for this image
png: 2
No description has been provided for this image
In [113]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 3A
parse3A_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {

dataframeindexed <- filter(dataframeoriginal,indel>1440,indel<1527)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1440)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "B")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)

}

#Insertion
parse3A_function(Scores_Insertional_Handle_Fullproteome,Insertions_3A_reindexed,"Insertions_3A_reindexed.csv")

#Deletion
parse3A_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_3A_reindexed,"Deletions_3A_reindexed.csv")

#DMS
Score_DMS_mean_3A <-  Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_3A <- Score_DMS_mean_3A %>% rename(indel = colnames(Score_DMS_mean_3A)[1])
parse3A_function(Score_DMS_mean_3A,DMS_3A_reindexed,"DMS_3A_reindexed.csv")
In [114]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 3C

parse3C_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {

dataframeindexed <- filter(dataframeoriginal,indel>1548,indel<1732)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- mutate(dataframeindexed,score=2^score)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1548)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "A")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)

}

#Insertion
parse3C_function(Scores_Insertional_Handle_Fullproteome,Insertions_3C_reindexed,"Insertions_3C_reindexed.csv")

#Deletion
parse3C_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_3C_reindexed,"Deletions_3C_reindexed.csv")

#DMS
Score_DMS_mean_3C <-  Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_3C <- Score_DMS_mean_3C %>% rename(indel = colnames(Score_DMS_mean_3C)[1])
parse3C_function(Score_DMS_mean_3C,DMS_3C_reindexed,"DMS_3C_reindexed.csv")
In [115]:
#Function to Parse dataframe for exponentiated score as an attribute for chimera for 3D(pol)
parse3D_function <- function(dataframeoriginal,dataframeindexed,writedataframe,pos) {

dataframeindexed <- filter(dataframeoriginal,indel>1731,indel<2194)
dataframeindexed <- na.omit(dataframeindexed)
dataframeindexed <- dataframeindexed %>% mutate(indel=indel-1731)
dataframeindexed <-  mutate(dataframeindexed,score=2^score)
dataframeindexed$indel <- paste0(dataframeindexed$indel, ".")
dataframeindexed$indel <- paste0(dataframeindexed$indel, "A")
dataframeindexed$indel <- paste(":", dataframeindexed$indel, sep = "")
write.csv(dataframeindexed,file = writedataframe,)

}

#Insertion
parse3D_function(Scores_Insertional_Handle_Fullproteome,Insertions_3D_reindexed,"Insertions_3D_reindexed.csv")

#Deletion
parse3D_function(Scores_Deletions1AA_Fullproteome_1AA,Deletions_3D_reindexed,"Deletions_3D_reindexed.csv")

#DMS
Score_DMS_mean_3D <-  Fullproteome_P2_DMS_Enrich2_long %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
Score_DMS_mean_3D <- Score_DMS_mean_3D %>% rename(indel = colnames(Score_DMS_mean_3D)[1])
parse3D_function(Score_DMS_mean_3D,DMS_3D_reindexed,"DMS_3D_reindexed.csv")
In [116]:
#Heatmap showing contact sites of 3D with template RNA

#Reading in txt file with contact sites as identified by chimera
overlap_3D <- read_csv("Overlap_Residues_Selected_filtered.txt")
#Function to create a residue list that contain all sites with the contacts
create_DMS_heatmap_residue_list <- function(residues_df) {
     residues_list <- residues_df[[1]]
        df <- merged_df_indel_DMS
        df <- mutate(df,position=position-1731)
    df <- filter(df, position %in% residues_list)
  df$position <- factor(df$position, levels = residues_list)
       
#Creating heatmap with the contact sites to show mutational robustness
    DMS_Heatmap <- ggplot(df) +
    geom_tile(aes(position, AminoAcid, fill = score)) +
    scale_fill_gradient2(midpoint = 0,limits = c(-8,8),
                         low = muted("hotpink"),
                         mid = "white",
                         high = muted("green"),
                         ,space = "Lab",
                         na.value = "black") +
    theme_classic()+ scale_x_discrete() +
    coord_fixed()+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
  
  filename <- paste0("DMS_Heatmap_3D_overlap", paste(collapse = "_"), ".pdf")
  ggsave(filename, height = 4, width = 10)

  return(DMS_Heatmap)
}

#Heatmap for 3D with only the contact sites
create_DMS_heatmap_residue_list(overlap_3D)
Rows: 48 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (1): Residue_Overlap_3D

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
No description has been provided for this image
In [117]:
#STRIDE of 3D
STRIDEP71_3D <- readr::read_tsv(file.path("STRIDE_3N6L.txt"),col_names = TRUE)
STRIDEP71_3D_DMS <- STRIDEP71_3D %>% rename(position = colnames(STRIDEP71_3D)[1])

#Areaplot enrich2 score class and secondary structures for insertions
NS3D_2ndary_insertion <- filter(Scores_Insertional_Handle_Fullproteome,indel>1731,indel<2194)
NS3D_2ndary_insertion <- cbind(NS3D_2ndary_insertion, STRIDEP71_3D$Secondary)
colnames(NS3D_2ndary_insertion)[colnames(NS3D_2ndary_insertion) == "STRIDEP71_3D$Secondary"] <- "Secondary"
NS3D_2ndary_insertion <- NS3D_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS3D_2ndary_insertion <- NS3D_2ndary_insertion %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS3D_2ndary_insertion_fitness_class  <- na.omit(NS3D_2ndary_insertion)
NS3D_2ndary_insertion_fitness_class$Class="N"
NS3D_2ndary_insertion_fitness_class[NS3D_2ndary_insertion_fitness_class$score<2,]$Class="N"
NS3D_2ndary_insertion_fitness_class[NS3D_2ndary_insertion_fitness_class$score<0,]$Class="L"
#NS3D_2ndary_insertion_fitness_class[NS3D_2ndary_insertion_fitness_class$score>2,]$Class="B"
NS3D_2ndary_insertion_fitness_class_Counts <- table(NS3D_2ndary_insertion_fitness_class$Class,NS3D_2ndary_insertion_fitness_class$Secondary)
chisq.test(NS3D_2ndary_insertion_fitness_class_Counts)
NS3D_2ndary_insertion_fitness_class$Class <- factor(NS3D_2ndary_insertion_fitness_class$Class, levels = c("N","L"))  
plot(as.factor(NS3D_2ndary_insertion_fitness_class$Secondary),as.factor(NS3D_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
pdf("insertion_fitness_class_2ndary_3D.pdf",width = 4, height = 4)
plot(as.factor(NS3D_2ndary_insertion_fitness_class$Secondary),as.factor(NS3D_2ndary_insertion_fitness_class$Class),col = fitness_class_colors,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()

#Areaplot enrich2 score class and secondary structures for deletions
NS3D_2ndary_del <- Scores_Deletions1AA_Fullproteome_1AA
NS3D_2ndary_del <- NS3D_2ndary_del %>% ungroup() 
NS3D_2ndary_del <- filter(NS3D_2ndary_del,indel>1731,indel<2194)
NS3D_2ndary_del <- cbind(NS3D_2ndary_del, STRIDEP71_3D$Secondary)
colnames(NS3D_2ndary_del)[colnames(NS3D_2ndary_del) == "STRIDEP71_3D$Secondary"] <- "Secondary"
NS3D_2ndary_del <- NS3D_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS3D_2ndary_del <- NS3D_2ndary_del %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
write.csv(NS3D_2ndary_del,file ="NS3D_2ndary_del.csv",)
NS3D_2ndary_del_fitness_class  <- na.omit(NS3D_2ndary_del)
NS3D_2ndary_del_fitness_class$Class="N"
NS3D_2ndary_del_fitness_class[NS3D_2ndary_del_fitness_class$score<2,]$Class="N"
NS3D_2ndary_del_fitness_class[NS3D_2ndary_del_fitness_class$score<0,]$Class="L"
#NS3D_2ndary_del_fitness_class[NS3D_2ndary_del_fitness_class$score>2,]$Class="B"
NS3D_2ndary_del_fitness_class_Counts <- table(NS3D_2ndary_del_fitness_class$Class,NS3D_2ndary_del_fitness_class$Secondary)
chisq.test(NS3D_2ndary_del_fitness_class_Counts)
NS3D_2ndary_del_fitness_class$Class <- factor(NS3D_2ndary_del_fitness_class$Class, levels = c("N","L"))  
plot(as.factor(NS3D_2ndary_del_fitness_class$Secondary),as.factor(NS3D_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
pdf("deletions_fitness_class_2ndary_3D.pdf",width = 4, height = 4)
plot(as.factor(NS3D_2ndary_del_fitness_class$Secondary),as.factor(NS3D_2ndary_del_fitness_class$Class),col = fitness_class_colors_del,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()

#Areaplot enrich2 score class and secondary structures for DMS
NS3D_2ndary_DMS <- Fullproteome_P2_DMS_Enrich2_long
NS3D_2ndary_DMS <- filter(NS3D_2ndary_DMS,position>1731,position<2194)
NS3D_2ndary_DMS <- select(NS3D_2ndary_DMS,position,score,AminoAcid)
NS3D_2ndary_DMS <- mutate(NS3D_2ndary_DMS,position=position-1731)
NS3D_2ndary_DMS <- merge(NS3D_2ndary_DMS,STRIDEP71_3D_DMS,by="position")
NS3D_2ndary_DMS <- NS3D_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('B', 'T', 'C'), 'L', Secondary))
NS3D_2ndary_DMS <- NS3D_2ndary_DMS %>% mutate(Secondary = ifelse(Secondary %in% c('G', 'H'), 'H', Secondary))
NS3D_2ndary_DMS_fitness_class  <- na.omit(NS3D_2ndary_DMS)
NS3D_2ndary_DMS_fitness_class$Class="N"
NS3D_2ndary_DMS_fitness_class[NS3D_2ndary_DMS_fitness_class$score<2,]$Class="N"
NS3D_2ndary_DMS_fitness_class[NS3D_2ndary_DMS_fitness_class$score<0,]$Class="L"
NS3D_2ndary_DMS_fitness_class[NS3D_2ndary_DMS_fitness_class$score>2,]$Class="B"
NS3D_2ndary_DMS_fitness_class_Counts <- table(NS3D_2ndary_DMS_fitness_class$Class,NS3D_2ndary_DMS_fitness_class$Secondary)
chisq.test(NS3D_2ndary_DMS_fitness_class_Counts)
NS3D_2ndary_DMS_fitness_class$Class <- factor(NS3D_2ndary_DMS_fitness_class$Class, levels = c("B", "N", "L"))  
plot(as.factor(NS3D_2ndary_DMS_fitness_class$Secondary),as.factor(NS3D_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
pdf("DMS_fitness_class_2ndary_3D.pdf",width = 4, height = 4)
plot(as.factor(NS3D_2ndary_DMS_fitness_class$Secondary),as.factor(NS3D_2ndary_DMS_fitness_class$Class),col = fitness_class_colors_dms,ylim=c(0.4,1.0),lwd = 0.5)
dev.off()
Rows: 462 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Secondary
dbl (1): Residue

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
	Chi-squared test for given probabilities

data:  NS3D_2ndary_insertion_fitness_class_Counts
X-squared = 73.753, df = 2, p-value < 2.2e-16
png: 2
Warning message in chisq.test(NS3D_2ndary_del_fitness_class_Counts):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  NS3D_2ndary_del_fitness_class_Counts
X-squared = 2.6817, df = 2, p-value = 0.2616
No description has been provided for this image
png: 2
	Pearson's Chi-squared test

data:  NS3D_2ndary_DMS_fitness_class_Counts
X-squared = 32.706, df = 4, p-value = 1.372e-06
No description has been provided for this image
png: 2
No description has been provided for this image
In [118]:
#Create areaplots for different amino acids and tolerance at secondary structures in 3D

#Low alphahelix propensity
r2A_mutationtype_function(NS3D_2ndary_DMS,"G","Glycine_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"P","Proline_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"Y","tyrosine_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"I","IsoLeucine_DMS_3D.pdf")

#High alphahelix propensity
r2A_mutationtype_function(NS3D_2ndary_DMS,"A","Alanine_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"M","met_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"E","Glu_DMS_3D.pdf")
r2A_mutationtype_function(NS3D_2ndary_DMS,"L","Leucine_DMS_3D.pdf")
   
      E   H   L
  B   0   4   3
  N   8  34  29
  L  58 156 141
No description has been provided for this image
   
      E   H   L
  B   0   2   3
  N   1  14  14
  L  60 179 164
No description has been provided for this image
   
      E   H   L
  B   0   7   3
  N   7  18  24
  L  53 164 165
No description has been provided for this image
   
      E   H   L
  B   3   4   6
  N   9  39  35
  L  52 139 146
No description has been provided for this image
   
      E   H   L
  B   1   9   9
  N  12  40  43
  L  49 135 136
No description has been provided for this image
   
      E   H   L
  B   0  13   8
  N  11  53  40
  L  53 127 145
No description has been provided for this image
   
      E   H   L
  B   0   5   1
  N   3  15  23
  L  60 160 163
No description has been provided for this image
   
      E   H   L
  B   1   4   6
  N   7  35  28
  L  53 137 144
No description has been provided for this image
In [119]:
#Figure 5. Structural interpretation of mutational effects of the Capsid

#Creating a dataframe with relative enrichment focused on VP1
Scores_Insertional_Handle_VP1 <- filter(Scores_Insertional_Handle_Fullproteome,indel>565,indel<863)
Scores_Insertional_Handle_VP1 <- mutate(Scores_Insertional_Handle_VP1,score=2^score)
Scores_Deletions_VP1 <- filter(Scores_Deletions_Fullproteome,indel>565,indel<863)
Scores_Deletions_VP1 <- mutate(Scores_Deletions_VP1,score=2^score)
Scores_Deletions1AA_VP1 <- filter(Scores_Deletions_VP1,dataset=="1AAdel")
Scores_Deletions2AA_VP1 <- filter(Scores_Deletions_VP1,dataset=="2AAdel")
Scores_Deletions3AA_VP1 <- filter(Scores_Deletions_VP1,dataset=="3AAdel")
VP1_P2_DMS_Enrich2_long <- filter(Fullproteome_P2_DMS_Enrich2_long,position>565,position<863)
VP1_P2_DMS_Enrich2_long_exponention <- mutate(VP1_P2_DMS_Enrich2_long,score=2^score)


#Function to create geompoint of different mutation types at VP1. VP1 loops are annotated using geomrect
    plot_function_VP1_inserts <- function(data,color,plotname,lim,limm) {
    plot <- ggplot(data)+geom_point(aes(indel,score),color=color,size=0.5)+  
theme_classic() +coord_cartesian(ylim = c(0, 75)) +xlab("Residue Position") +ylab("Relative Enrichment") +  
    geom_rect(aes(xmin = 661, xmax = 670, ymin = 1, ymax = 100),alpha = 0.002,fill = "dimgrey",size = 0) +
    geom_rect(aes(xmin = 706, xmax = 714, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +  
    geom_rect(aes(xmin = 722, xmax = 742, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
    geom_rect(aes(xmin = 748, xmax = 752, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
    geom_rect(aes(xmin = 803, xmax = 811, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0)+
  
  geom_text(data = subset(data, score > 10), 
         aes(indel, score, label = paste(indel)), 
          vjust = -1,hjust=1,size = 1.5)


     ggsave(plot,filename=plotname,width = 4,height=1.5)
        return(plot)

    
    }

# Usage of the plotting function
plot_function_VP1_inserts(Scores_Insertional_Handle_VP1,"#1558a9","Scores_Insertional_Handle_VP1.pdf")
plot_function_VP1_inserts(Scores_Deletions2AA_VP1,"#a91515","Scores_Deletions2AA_VP1.pdf")
plot_function_VP1_inserts(Scores_Deletions3AA_VP1,"#a91515","Scores_Deletions3AA_VP1.pdf")

#Geompoint for 1AA deletion at VP1
Scores_Deletions1AA_VP1_plot <- ggplot(Scores_Deletions1AA_VP1)+geom_point(aes(indel,score),color= "#a91515",size=0.5)+  
theme_classic() +coord_cartesian(ylim = c(0, 120)) +xlab("Residue Position") +ylab("Relative Enrichment") +  
    geom_rect(aes(xmin = 661, xmax = 670, ymin = 1, ymax = 145),alpha = 0.002,fill = "dimgrey",size = 0) +
    geom_rect(aes(xmin = 706, xmax = 714, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0) +  
    geom_rect(aes(xmin = 722, xmax = 742, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0) +
    geom_rect(aes(xmin = 748, xmax = 752, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0) +
    geom_rect(aes(xmin = 803, xmax = 811, ymin = 1, ymax = 145), alpha = 0.002, fill = "dimgrey", size = 0)+
    geom_text(data = subset(Scores_Deletions1AA_VP1, score > 10), 
         aes(indel, score, label = paste(indel)), 
          vjust = -1,hjust=1,size = 1.5)
Scores_Deletions1AA_VP1_plot
#Save figure 
ggsave(Scores_Deletions1AA_VP1_plot,filename="Scores_Deletions1AA_VP1.pdf",width = 4,height=1.5)

#Geompoint for DMS at VP1
DMS_tophit_EV71_VP1 <- ggplot(VP1_P2_DMS_Enrich2_long_exponention)+geom_point(aes(position,score),color="#3c7430",size=0.5)+  
theme_classic() +coord_cartesian(ylim = c(0, 100)) +xlab("Residue Position") +ylab("Relative Enrichment") +  
    geom_rect(aes(xmin = 661, xmax = 670, ymin = 1, ymax = 100),alpha = 0.002,fill = "dimgrey",size = 0) +
    geom_rect(aes(xmin = 706, xmax = 714, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +  
    geom_rect(aes(xmin = 722, xmax = 742, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
    geom_rect(aes(xmin = 748, xmax = 752, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0) +
    geom_rect(aes(xmin = 803, xmax = 811, ymin = 1, ymax = 100), alpha = 0.002, fill = "dimgrey", size = 0)+
  geom_text(data = subset(VP1_P2_DMS_Enrich2_long_exponention, score > 50), 
          aes(position, score, label = paste(position,AminoAcid)), 
           vjust = -2,hjust=1,size = 1.5)
DMS_tophit_EV71_VP1
#Save Figure
ggsave(DMS_tophit_EV71_VP1,filename="DMS_tophit_EV71_VP1.pdf",width = 4,height=1.5)
Warning message:
“Removed 2 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 2 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 1 rows containing missing values (`geom_point()`).”
Warning message:
“Removed 297 rows containing missing values (`geom_point()`).”
No description has been provided for this image
Warning message:
“Removed 297 rows containing missing values (`geom_point()`).”
No description has been provided for this image
In [120]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: Insertional Handle
Insertions_Strucutral_P2 <- filter(Scores_Insertional_Handle_Fullproteome,indel<863)
Insertions_Strucutral_P2 <- Insertions_Strucutral_P2 %>% replace(is.na(.), "")
Insertions_Strucutral_P2 <- mutate(Insertions_Strucutral_P2,score=2^score)
Insertions_Strucutral_P2_reindexed <- Insertions_Strucutral_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Insertions_Strucutral_P2_reindexed <- Insertions_Strucutral_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Insertions_Strucutral_P2_reindexed <- Insertions_Strucutral_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Insertions_Strucutral_P2_reindexed$indel <- paste0(Insertions_Strucutral_P2_reindexed$indel, ".")
Insertions_Strucutral_P2_reindexed$indel[1:69]  <- paste0(Insertions_Strucutral_P2_reindexed$indel[1:69], "D")
Insertions_Strucutral_P2_reindexed$indel[70:323]  <- paste0(Insertions_Strucutral_P2_reindexed$indel[70:323], "B")
Insertions_Strucutral_P2_reindexed$indel[324:565]  <- paste0(Insertions_Strucutral_P2_reindexed$indel[324:565], "C")
Insertions_Strucutral_P2_reindexed$indel[566:862]  <- paste0(Insertions_Strucutral_P2_reindexed$indel[566:862], "A")
Insertions_Strucutral_P2_reindexed$indel <- paste(":", Insertions_Strucutral_P2_reindexed$indel, sep = "")
write.csv(Insertions_Strucutral_P2_reindexed,file ="Capsid_Chimera_insertions_Attached_p2.csv",)
In [121]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: 1 AA deletion
Deletions_Strucutral_3bp_P2 <- filter(Scores_Deletions_Fullproteome,dataset=="1AAdel")
Deletions_Strucutral_3bp_P2 <- filter(Deletions_Strucutral_3bp_P2,indel<863)
Deletions_Strucutral_3bp_P2 <- mutate(Deletions_Strucutral_3bp_P2,score=2^score)
Deletions_Strucutral_3bp_P2_reindexed <- Deletions_Strucutral_3bp_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Deletions_Strucutral_3bp_P2_reindexed <- Deletions_Strucutral_3bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Deletions_Strucutral_3bp_P2_reindexed <- Deletions_Strucutral_3bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Deletions_Strucutral_3bp_P2_reindexed$indel <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel, ".")
Deletions_Strucutral_3bp_P2_reindexed$indel[1:69]  <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[1:69], "D")
Deletions_Strucutral_3bp_P2_reindexed$indel[70:323]  <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[70:323], "B")
Deletions_Strucutral_3bp_P2_reindexed$indel[324:565]  <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[324:565], "C")
Deletions_Strucutral_3bp_P2_reindexed$indel[566:862]  <- paste0(Deletions_Strucutral_3bp_P2_reindexed$indel[566:862], "A")
Deletions_Strucutral_3bp_P2_reindexed$indel <- paste(":", Deletions_Strucutral_3bp_P2_reindexed$indel, sep = "")
Deletions_Strucutral_3bp_P2_reindexed <- na.omit(Deletions_Strucutral_3bp_P2_reindexed)
write.csv(Deletions_Strucutral_3bp_P2_reindexed,file ="Capsid_Chimera_deletions_Attached_3bp_p2.csv",)
In [122]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: 2 AA deletion
Deletions_Strucutral_6bp_P2 <- filter(Scores_Deletions_Fullproteome,dataset=="2AAdel")
Deletions_Strucutral_6bp_P2 <- filter(Deletions_Strucutral_6bp_P2,indel<863)
Deletions_Strucutral_6bp_P2 <- mutate(Deletions_Strucutral_6bp_P2,score=2^score)
Deletions_Strucutral_6bp_P2_reindexed <- Deletions_Strucutral_6bp_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Deletions_Strucutral_6bp_P2_reindexed <- Deletions_Strucutral_6bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Deletions_Strucutral_6bp_P2_reindexed <- Deletions_Strucutral_6bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Deletions_Strucutral_6bp_P2_reindexed$indel <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel, ".")
Deletions_Strucutral_6bp_P2_reindexed$indel[1:69]  <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[1:69], "D")
Deletions_Strucutral_6bp_P2_reindexed$indel[70:323]  <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[70:323], "B")
Deletions_Strucutral_6bp_P2_reindexed$indel[324:565]  <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[324:565], "C")
Deletions_Strucutral_6bp_P2_reindexed$indel[566:862]  <- paste0(Deletions_Strucutral_6bp_P2_reindexed$indel[566:862], "A")
Deletions_Strucutral_6bp_P2_reindexed$indel <- paste(":", Deletions_Strucutral_6bp_P2_reindexed$indel, sep = "")
Deletions_Strucutral_6bp_P2_reindexed <- na.omit(Deletions_Strucutral_6bp_P2_reindexed)
write.csv(Deletions_Strucutral_6bp_P2_reindexed,file ="Capsid_Chimera_deletions_Attached_6bp_p2.csv",)
In [123]:
#Parsing dataframe for exponentiated score as an attribute for chimera for the capsid proteins: 3 AA deletion
Deletions_Strucutral_9bp_P2 <- filter(Scores_Deletions_Fullproteome,dataset=="3AAdel")
Deletions_Strucutral_9bp_P2 <- filter(Deletions_Strucutral_9bp_P2,indel<863)
Deletions_Strucutral_9bp_P2 <- mutate(Deletions_Strucutral_9bp_P2,score=2^score)
Deletions_Strucutral_9bp_P2_reindexed <- Deletions_Strucutral_9bp_P2 %>% mutate(indel = ifelse(row_number() %in% c(70:323), indel - 69, indel))
Deletions_Strucutral_9bp_P2_reindexed <- Deletions_Strucutral_9bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(324:565), indel - 323, indel))
Deletions_Strucutral_9bp_P2_reindexed <- Deletions_Strucutral_9bp_P2_reindexed %>% mutate(indel = ifelse(row_number() %in% c(566:862), indel - 565, indel))
Deletions_Strucutral_9bp_P2_reindexed$indel <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel, ".")
Deletions_Strucutral_9bp_P2_reindexed$indel[1:69]  <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[1:69], "D")
Deletions_Strucutral_9bp_P2_reindexed$indel[70:323]  <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[70:323], "B")
Deletions_Strucutral_9bp_P2_reindexed$indel[324:565]  <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[324:565], "C")
Deletions_Strucutral_9bp_P2_reindexed$indel[566:862]  <- paste0(Deletions_Strucutral_9bp_P2_reindexed$indel[566:862], "A")
Deletions_Strucutral_9bp_P2_reindexed$indel <- paste(":", Deletions_Strucutral_9bp_P2_reindexed$indel, sep = "")
Deletions_Strucutral_9bp_P2_reindexed <- na.omit(Deletions_Strucutral_9bp_P2_reindexed)
write.csv(Deletions_Strucutral_9bp_P2_reindexed,file ="Capsid_Chimera_deletions_Attached_9bp_p2.csv",)
In [124]:
#Parsing dataframe for mean exponentiated score as an attribute for chimera for the capsid proteins: DMS
DMS_Strucutral_P2 <- filter(Fullproteome_P2_DMS_Enrich2_long,position<863)
DMS_Strucutral_P2 <- mutate(DMS_Strucutral_P2,score=2^score)
DMS_Strucutral_P2 <-  DMS_Strucutral_P2 %>% group_by(position) %>% summarise(score = mean(score, na.rm = TRUE))
DMS_Strucutral_P2 <- DMS_Strucutral_P2 %>% replace(is.na(.), "")
DMS_Strucutral_P2_reindexed <- DMS_Strucutral_P2 %>% mutate(position = ifelse(row_number() %in% c(70:323), position - 69, position))
DMS_Strucutral_P2_reindexed <- DMS_Strucutral_P2_reindexed %>% mutate(position = ifelse(row_number() %in% c(324:565), position - 323, position))
DMS_Strucutral_P2_reindexed <- DMS_Strucutral_P2_reindexed %>% mutate(position = ifelse(row_number() %in% c(566:862), position - 565, position))
DMS_Strucutral_P2_reindexed$position <- paste0(DMS_Strucutral_P2_reindexed$position, ".")
DMS_Strucutral_P2_reindexed$position[1:69]  <- paste0(DMS_Strucutral_P2_reindexed$position[1:69], "D")
DMS_Strucutral_P2_reindexed$position[70:323]  <- paste0(DMS_Strucutral_P2_reindexed$position[70:323], "B")
DMS_Strucutral_P2_reindexed$position[324:565]  <- paste0(DMS_Strucutral_P2_reindexed$position[324:565], "C")
DMS_Strucutral_P2_reindexed$position[566:862]  <- paste0(DMS_Strucutral_P2_reindexed$position[566:862], "A")
DMS_Strucutral_P2_reindexed$position <- paste(":", DMS_Strucutral_P2_reindexed$position, sep = "")
write.csv(DMS_Strucutral_P2_reindexed,file ="Capsid_Chimera_DMS_Attached_p2.csv",)
In [125]:
#Figure 6. InDels as a contributor of Enterovirus A species evolution

#Function for creating a heatmap
create_DMS_heatmap <- function(lower_bound, upper_bound) {
  df <- filter(merged_df_indel_DMS, position > lower_bound, position < upper_bound)
  
  DMS_Heatmap <- ggplot(df) +
    geom_tile(aes(position, AminoAcid, fill = score)) +
    scale_fill_gradient2(midpoint = 0,limits = c(-8,8),
                         low = muted("hotpink"),
                         mid = "white",
                         high = muted("green"),
                         space = "Lab",
                         na.value = "black") +
    theme_classic() +
    coord_fixed()
  
  filename <- paste0("DMS_Heatmap_", lower_bound, "_to_", upper_bound, ".pdf")
  ggsave(filename, height = 6, width = 6)

  return(DMS_Heatmap)
}

#Create the heatmaps at N-termini of VP1 and 2A
create_DMS_heatmap(lower_bound = 549, upper_bound = 601)
create_DMS_heatmap(lower_bound = 849, upper_bound = 901)
No description has been provided for this image
No description has been provided for this image
In [126]:
#Supplementary Figure 7. Deep Insertion, Deletion, and Mutational Scanning at InDel hotspot regions 

#Parsing the input competition pooled experiment
Scores_Competition_input_plotting <- Scores_Competition_input
Scores_Competition_input_plotting$position <- gsub("p\\.\\((\\d+).*\\)", "\\1", Scores_Competition_input_plotting$hgvs)
Scores_Competition_input_plotting$Variant <- gsub("p\\.\\(\\d+(.*)\\)", "\\1", Scores_Competition_input_plotting$hgvs)
Scores_Competition_input_plotting$position <- as.numeric(Scores_Competition_input_plotting$position)

#Bar plot for input plasmid library at the N-terminus of VP1
Competition_input_plot_Capsid <-ggplot(Scores_Competition_input_plotting)+
stat_summary_bin(show.legend = F,geom = "bar",aes(position,count),binwidth = 1,fun = function(X) {log10(sum(X))} )+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+ coord_cartesian(xlim = c(577, 612)) 
ggsave("Competition_input_plot_Capsid.pdf",width = 2,height=2)
Competition_input_plot_Capsid

#Bar plot for input plasmid library at the N-terminus of 2A
Competition_input_plot_Replication <-ggplot(Scores_Competition_input_plotting)+
stat_summary_bin(show.legend = F,geom = "bar",aes(position,count),binwidth = 1,fun = function(X) {log10(sum(X))} )+theme_classic()+
ylab(expression(Variant~Count~ (log[10])))+ xlab("Amino acid Position")+ coord_cartesian(xlim = c(863, 910)) 
Competition_input_plot_Replication
ggsave("Competition_input_plot_Replication.pdf",width = 2,height=2)

#Statistics included
#Count positions with count more than 1 
positions_above_1_competition <- sum(Scores_Competition_input_plotting$count > 1)
# Print the result
cat("Number of positions with count more than 1:", positions_above_1_competition/1932, "\n")
No description has been provided for this image
Number of positions with count more than 1: 1 
No description has been provided for this image
In [127]:
#Parsing the enrich2 score dataframes for the competition pooled dataframe at P2
Competition_Pool_Enrich2_Scores$position <- gsub("p\\.\\((\\d+).*\\)", "\\1", Competition_Pool_Enrich2_Scores$hgvs)
Competition_Pool_Enrich2_Scores$Variant <- gsub("p\\.\\(\\d+(.*)\\)", "\\1", Competition_Pool_Enrich2_Scores$hgvs)
Competition_Pool_Enrich2_Scores$position <- as.numeric(Competition_Pool_Enrich2_Scores$position)
Competition_Pool_Enrich2_Scores <- select(Competition_Pool_Enrich2_Scores,position,Variant,score)
Competition_Pool_Enrich2_Scores
Warning message in eval(expr, envir, enclos):
“NAs introduced by coercion”
A tibble: 1933 × 3
positionVariantscore
<dbl><chr><dbl>
577_1AAdel-5.0212407
577_2AAdel-6.7742591
577_3AAdel-5.0938696
577A -4.0681920
577C -5.4932380
577D -5.6746217
577E -5.3712305
577F -5.1615022
577G -5.0592367
577H -5.2259058
577handle -4.9600395
577K -1.8548324
577L -2.1140470
577M -0.7525873
577N -5.2285782
577P -4.3889911
577Q -4.4021126
577R -2.3668099
577S -4.7022723
577T -1.7286223
577V 1.0628603
577W -3.3041341
577Y -3.2116524
578_1AAdel-4.2644383
578_2AAdel-2.1168360
578_3AAdel-4.2882355
578A -3.4267051
578C -6.2138290
578D -4.6105516
578E -1.2980551
⋮⋮⋮
909R 1.1091144
909S 0.2636672
909T 0.8568695
909V 0.2414721
909W 1.0353956
909Y 1.1249580
910_1AAdel-6.2983892
910_2AAdel-5.0388709
910_3AAdel-5.9048265
910A 0.9576387
910C 0.3175552
910D 1.1818012
910E 1.4313338
910F 1.3379674
910G 0.5113716
910H 0.9969034
910handle -6.1848293
910I -4.6552593
910K -1.4978718
910L -0.6404986
910M 0.9430934
910N 0.6567146
910P -3.4641825
910R -1.2790269
910S -0.4744694
910T -3.2459876
910V -4.7139707
910W 1.4848147
910Y 1.5962951
NA_wt 0.0000000
In [128]:
#Creating a dataframe with the exponentiated scores
Competition_Pool_Enrich2_Scores$Variant <- factor(Competition_Pool_Enrich2_Scores$Variant, levels = c(levels = "handle","_1AAdel","_2AAdel","_3AAdel","P","G","Y","W","F","V","L","I","A","T","S","Q","N","M","C","E","D","R","K","H"))
Competition_Pool_Enrich2_Scores_exponen <- mutate(Competition_Pool_Enrich2_Scores,score=2^score)
Competition_Pool_Enrich2_Scores_exponen_VP1 <- filter(Competition_Pool_Enrich2_Scores_exponen,position<620)
Competition_Pool_Enrich2_Scores_exponen_2A <- filter(Competition_Pool_Enrich2_Scores_exponen,position>850)

#Boxplot for different variants for the competition pooled experiment
Boxplot_Competition_Variants <- ggplot(Competition_Pool_Enrich2_Scores, aes(y = score, x = Variant))+ylim(-6,6)+geom_boxplot(outlier.shape = NA,width = 0.5)+theme_classic()+labs(y="Median Scores",x="")+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Boxplot_Competition_Variants
ggsave("Boxplot_Competition_Variants.pdf", height = 2, width = 4)

#Geompoint for variants at the N-terminus of VP1
DMS_tophit_competitionpool_VP1 <- ggplot(Competition_Pool_Enrich2_Scores_exponen_VP1)+geom_point(aes(position,score),color="#7c7b7b",size=0.5)+  
theme_classic() +coord_cartesian(ylim = c(0, 20)) +xlab("Residue Position") +ylab("Relative Enrichment") +  
    geom_text(data = subset(Competition_Pool_Enrich2_Scores_exponen_VP1, score > 10), 
          aes(position, score, label = paste(position,Variant)), 
           vjust = 0,hjust=0,size = 1) 
#Save figure  
ggsave("DMS_tophit_competitionpool_VP1.pdf", height = 2, width = 2)

#Geompoint for variants at the N-terminus of 2A
DMS_tophit_competitionpool_2A <- ggplot(Competition_Pool_Enrich2_Scores_exponen_2A)+geom_point(aes(position,score),color="#7c7b7b",size=0.5)+  
theme_classic() +coord_cartesian(ylim = c(0, 20)) +xlab("Residue Position") +ylab("Relative Enrichment") +  
    geom_text(data = subset(Competition_Pool_Enrich2_Scores_exponen_2A, score > 10), 
          aes(position, score, label = paste(position,Variant)), 
           vjust = 0,hjust=0,size = 1) 
#Save figure  
ggsave("DMS_tophit_competitionpool_2A.pdf", height = 2, width = 2)

DMS_tophit_competitionpool_VP1
DMS_tophit_competitionpool_2A


#DMFE for deletions
Competition_Pool_Enrich2_Scores_Deletions <- filter(Competition_Pool_Enrich2_Scores,Variant %in% c("_1AAdel", "_2AAdel","_3AAdel"))
histo_competition_pool_Deletions  <- select(Competition_Pool_Enrich2_Scores_Deletions,score)
histo_competition_pool_histo_Deletions <- ggplot(Competition_Pool_Enrich2_Scores_Deletions, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-8,8))
histo_competition_pool_histo_Deletions
ggsave("histo_competition_pool_histo_Deletions.pdf",width = 2,height=1.5)
#Inset for DMFE for deletions
histo_del_sel_nonlethal_competition_pool <- filter(histo_competition_pool_Deletions,score>0)
del_histogram_nonlethal_competition <- ggplot(histo_del_sel_nonlethal_competition_pool, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#a91515", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
del_histogram_nonlethal_competition
#Save figure
ggsave("del_histogram_nonlethal_competition.pdf",width = 2,height=1.5)

#DMFE for Insertions
Competition_Pool_Enrich2_Scores_Insertions <- filter(Competition_Pool_Enrich2_Scores,Variant=="handle")
histo_competition_pool_Insertions  <- select(Competition_Pool_Enrich2_Scores_Insertions,score)
histo_competition_pool_histo_Insertions <- ggplot(histo_competition_pool_Insertions, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#1558a9", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-8,8))
histo_competition_pool_histo_Insertions
ggsave("histo_competition_pool_Insertions.pdf",width = 2,height=1.5)
#Inset for DMFE for Insertions
histo_insertions_sel_nonlethal_competition_pool <- filter(histo_competition_pool_Insertions,score>0)
insertions_histogram_nonlethal_competition <- ggplot(histo_insertions_sel_nonlethal_competition_pool, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#1558a9", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
insertions_histogram_nonlethal_competition
#Save figure
ggsave("insertions_histogram_nonlethal_competition.pdf",width = 2,height=1.5)

#DMFE for DMS
Competition_Pool_Enrich2_Scores_AA <- filter(Competition_Pool_Enrich2_Scores,Variant %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
histo_competition_pool_AA  <- select(Competition_Pool_Enrich2_Scores_AA,score)
histo_competition_pool_histo_AA <- ggplot(histo_competition_pool_AA, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#3c7430", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(-8,8))
histo_competition_pool_histo_AA
#Save figure
ggsave("histo_competition_pool_histo_AA.pdf",width = 2,height=1.5)
#Inset for DMFE for DMS

histo_AA_sel_nonlethal_competition_pool <- filter(histo_competition_pool_AA,score>0)
AA_histogram_nonlethal_competition <- ggplot(histo_AA_sel_nonlethal_competition_pool, aes(x = score)) +geom_histogram(binwidth = 0.5, fill = "#3c7430", color = "black",size = 0.3) +
labs(x = "Enrich2 Score", y = "Number of Variants") +theme_classic()+coord_cartesian(xlim=c(0,6))
AA_histogram_nonlethal_competition
#Save figure
ggsave("AA_histogram_nonlethal_competition.pdf",width = 2,height=1.5)
Warning message:
“Removed 160 rows containing non-finite values (`stat_boxplot()`).”
Warning message:
“Removed 160 rows containing non-finite values (`stat_boxplot()`).”
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [129]:
#Dataframe for pooled capsid Insertional Handle  
Competition_Pool_Enrich2_Scores_Capsid_Insertion <- filter(Competition_Pool_Enrich2_Scores,position<613)
Competition_Pool_Enrich2_Scores_Capsid_Insertion <- filter(Competition_Pool_Enrich2_Scores_Capsid_Insertion,Variant=="handle")
Competition_Pool_Enrich2_Scores_Capsid_Insertion <- Competition_Pool_Enrich2_Scores_Capsid_Insertion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Capsid_Insertion)[3])
Competition_Pool_Enrich2_Scores_Capsid_Insertion

#Dataframe for capsid Insertional Handle non-normalized experiment  
Capsid_insertion_N_terminus_VP1 <- filter(merged_df_indel_DMS,position>576 & position<613)
Capsid_insertion_N_terminus_VP1 <- filter(Capsid_insertion_N_terminus_VP1,AminoAcid =="Ins")
Capsid_insertion_N_terminus_VP1$AminoAcid <- gsub("Ins", "handle", Capsid_insertion_N_terminus_VP1$AminoAcid)
Capsid_insertion_N_terminus_VP1 <- Capsid_insertion_N_terminus_VP1 %>% rename(Variant = colnames(Capsid_insertion_N_terminus_VP1)[2])
Capsid_insertion_N_terminus_VP1 <- Capsid_insertion_N_terminus_VP1 %>% rename(score_nonwt = colnames(Capsid_insertion_N_terminus_VP1)[3])
Capsid_insertion_N_terminus_VP1
Merged_Capsid_insertion_wt_nonwt  <- merge(Competition_Pool_Enrich2_Scores_Capsid_Insertion,Capsid_insertion_N_terminus_VP1,all.x=TRUE,all.y=TRUE)
Merged_Capsid_insertion_wt_nonwt

#Scatter plot capsid insertion wildtype vs non wildtype normalized
Scatter_plot_CapsidInsertion_wt_nonwt <- ggplot(Merged_Capsid_insertion_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Enrich2 score", y = "Enrich2 scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_CapsidInsertion_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_CapsidInsertion_wt_nonwt,filename = "Scatter_plot_CapsidInsertion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Capsid_insertion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Capsid_insertion_wt_nonwt)
summary_model_Capsid_Insertion <- summary(model_Capsid_insertion)
summary_model_Capsid_Insertion
A tibble: 36 × 3
positionVariantscore_wt
<dbl><fct><dbl>
577handle-4.9600395
578handle-6.4340859
579handle-5.2972617
580handle-6.2122032
581handle-5.5548090
582handle-6.2967363
583handle-4.6068378
584handle-6.1976971
585handle-0.1871479
586handle-3.3236263
587handle-4.0581977
588handle-4.0090099
589handle-3.5556873
590handle-2.9296332
591handle-3.1259828
592handle-2.6610715
593handle-2.9584803
594handle-4.9355758
595handle-5.1145645
596handle-6.0567015
597handle-6.0202949
598handle-5.1790274
599handle-5.1419044
600handle-6.0524865
601handle-6.1885227
602handle-6.5010098
603handle-4.8790012
604handle-6.8700823
605handle-5.5604899
606handle-6.5090689
607handle-6.3777329
608handle-6.6716882
609handle-6.4625171
610handle-6.5611980
611handle-6.2211639
612handle-6.0795715
A data.frame: 36 × 4
positionVariantscore_nonwtsub
<dbl><chr><dbl><chr>
577handle-2.296280C
578handle-3.992367C
579handle-3.218787C
580handle-3.369305C
581handle-3.725543C
582handle-3.813319C
583handle-2.461757C
584handle 2.109562C
585handle 6.061249C
586handle 1.479317C
587handle 1.679606C
588handle-2.499626C
589handle 1.083437C
590handle 3.218903C
591handle 1.051457C
592handle 1.730435C
593handle-3.813319C
594handle-3.929545C
595handle-3.934520C
596handle-3.813319C
597handle-3.846380C
598handle-3.840945C
599handle-3.484659C
600handle-3.888826C
601handle-3.924545C
602handle-4.175859C
603handle-3.208583C
604handle-4.268863C
605handle-3.360533C
606handle-4.086186C
607handle-4.103100C
608handle-3.978216C
609handle-3.767466C
610handle-4.081912C
611handle-3.784908C
612handle-3.537702C
A data.frame: 36 × 5
positionVariantscore_wtscore_nonwtsub
<dbl><fct><dbl><dbl><chr>
577handle-4.9600395-2.296280C
578handle-6.4340859-3.992367C
579handle-5.2972617-3.218787C
580handle-6.2122032-3.369305C
581handle-5.5548090-3.725543C
582handle-6.2967363-3.813319C
583handle-4.6068378-2.461757C
584handle-6.1976971 2.109562C
585handle-0.1871479 6.061249C
586handle-3.3236263 1.479317C
587handle-4.0581977 1.679606C
588handle-4.0090099-2.499626C
589handle-3.5556873 1.083437C
590handle-2.9296332 3.218903C
591handle-3.1259828 1.051457C
592handle-2.6610715 1.730435C
593handle-2.9584803-3.813319C
594handle-4.9355758-3.929545C
595handle-5.1145645-3.934520C
596handle-6.0567015-3.813319C
597handle-6.0202949-3.846380C
598handle-5.1790274-3.840945C
599handle-5.1419044-3.484659C
600handle-6.0524865-3.888826C
601handle-6.1885227-3.924545C
602handle-6.5010098-4.175859C
603handle-4.8790012-3.208583C
604handle-6.8700823-4.268863C
605handle-5.5604899-3.360533C
606handle-6.5090689-4.086186C
607handle-6.3777329-4.103100C
608handle-6.6716882-3.978216C
609handle-6.4625171-3.767466C
610handle-6.5611980-4.081912C
611handle-6.2211639-3.784908C
612handle-6.0795715-3.537702C
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Capsid_insertion_wt_nonwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0259 -0.5170 -0.1867  0.5555  2.8669 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -4.11692    0.20625 -19.961  < 2e-16 ***
score_nonwt  0.44803    0.05886   7.612 7.62e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.925 on 34 degrees of freedom
Multiple R-squared:  0.6302,	Adjusted R-squared:  0.6193 
F-statistic: 57.94 on 1 and 34 DF,  p-value: 7.622e-09
No description has been provided for this image
In [130]:
#Dataframe for pooled replication Insertional Handle  
Competition_Pool_Enrich2_Scores_Replication_Insertion <- filter(Competition_Pool_Enrich2_Scores,position>862)
Competition_Pool_Enrich2_Scores_Replication_Insertion <- filter(Competition_Pool_Enrich2_Scores_Replication_Insertion,Variant=="handle")
Competition_Pool_Enrich2_Scores_Replication_Insertion <- Competition_Pool_Enrich2_Scores_Replication_Insertion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Replication_Insertion)[3])
Competition_Pool_Enrich2_Scores_Replication_Insertion

#Dataframe for replication Insertional Handle non-normalized experiment  
Replication_insertion_N_terminus_2A <- filter(merged_df_indel_DMS,position>862 & position<911)
Replication_insertion_N_terminus_2A <- filter(Replication_insertion_N_terminus_2A,AminoAcid =="Ins")
Replication_insertion_N_terminus_2A$AminoAcid <- gsub("Ins", "handle", Replication_insertion_N_terminus_2A$AminoAcid)
Replication_insertion_N_terminus_2A <- Replication_insertion_N_terminus_2A %>% rename(Variant = colnames(Replication_insertion_N_terminus_2A)[2])
Replication_insertion_N_terminus_2A <- Replication_insertion_N_terminus_2A %>% rename(score_nonwt = colnames(Replication_insertion_N_terminus_2A)[3])
Replication_insertion_N_terminus_2A
Merged_Replication_DMS_wt_nonwt  <- merge(Competition_Pool_Enrich2_Scores_Replication_Insertion,Replication_insertion_N_terminus_2A,all.x=TRUE,all.y=TRUE)

Merged_Replication_DMS_wt_nonwt

#Scatter plot replication insertion wildtype vs non wildtype normalized
Scatter_plot_ReplicationInsertion_wt_nonwt <- ggplot(Merged_Replication_DMS_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#1558a9")+labs(x = "Enrich2 score", y = "Enrich2 scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_ReplicationInsertion_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_ReplicationInsertion_wt_nonwt,filename = "Scatter_plot_ReplicationInsertion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Replication_insertion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Replication_DMS_wt_nonwt)
summary_model_Replication_insertion <- summary(model_Replication_insertion)
summary_model_Replication_insertion
A tibble: 48 × 3
positionVariantscore_wt
<dbl><fct><dbl>
863handle 0.98036629
864handle 0.88918593
865handle 0.84169479
866handle 0.29912089
867handle 0.16526489
868handle 0.06288616
869handle-0.20596919
870handle-0.72549520
871handle-1.55513454
872handle-2.34895633
873handle-1.95661272
874handle-2.97841919
875handle-2.64947742
876handle-2.95698400
877handle-3.44412083
878handle-4.16802016
879handle-5.29303557
880handle-5.85205047
881handle-5.88254839
882handle-5.98024507
883handle-6.06716185
884handle-6.01373034
885handle-5.70197749
886handle-5.51815555
887handle-5.78259666
888handle-5.63373396
889handle-6.15858595
890handle-5.68991098
891handle-5.53246707
892handle-5.61100571
893handle-5.64331870
894handle-5.66533216
895handle-5.59444904
896handle-5.55355748
897handle-5.75738524
898handle-5.63373396
899handle-5.70795658
900handle-5.59110455
901handle-5.30174603
902handle-5.58438184
903handle-5.54306788
904handle-5.53246707
905handle-6.03114101
906handle-6.31801369
907handle-6.37620621
908handle-5.67769708
909handle-5.68687146
910handle-6.18482926
A data.frame: 48 × 4
positionVariantscore_nonwtsub
<dbl><chr><dbl><chr>
863handle 5.6617765D
864handle 5.6148419D
865handle 5.3758824D
866handle 5.2051635D
867handle 4.9301044D
868handle 4.5716542D
869handle 4.1886510D
870handle 3.5357299D
871handle 1.4974590D
872handle-5.8030172D
873handle 0.8720260D
874handle-5.0070473D
875handle-4.8366361D
876handle-4.8366361D
877handle-5.1762261D
878handle-4.9694842D
879handle-5.2511675D
880handle-5.1202114D
881handle-5.0070473D
882handle-5.2511675D
883handle-4.9884421D
884handle-4.8688970D
885handle-5.0432504D
886handle-4.7570487D
887handle-4.9304549D
888handle-4.7570487D
889handle-5.0162217D
890handle-4.8256470D
891handle-4.6833958D
892handle-4.4869624D
893handle-5.0695677D
894handle-4.8256470D
895handle-4.8366361D
896handle-4.6310963D
897handle-4.8145358D
898handle-4.6575879D
899handle-5.0695677D
900handle-4.8256470D
901handle-4.3720821D
902handle-4.7688135D
903handle-4.7804416D
904handle-4.8366361D
905handle-5.2725969D
906handle-5.5803933D
907handle-1.9350930D
908handle-0.6678258D
909handle-0.5988811D
910handle 0.5048040D
A data.frame: 48 × 5
positionVariantscore_wtscore_nonwtsub
<dbl><fct><dbl><dbl><chr>
863handle 0.98036629 5.6617765D
864handle 0.88918593 5.6148419D
865handle 0.84169479 5.3758824D
866handle 0.29912089 5.2051635D
867handle 0.16526489 4.9301044D
868handle 0.06288616 4.5716542D
869handle-0.20596919 4.1886510D
870handle-0.72549520 3.5357299D
871handle-1.55513454 1.4974590D
872handle-2.34895633-5.8030172D
873handle-1.95661272 0.8720260D
874handle-2.97841919-5.0070473D
875handle-2.64947742-4.8366361D
876handle-2.95698400-4.8366361D
877handle-3.44412083-5.1762261D
878handle-4.16802016-4.9694842D
879handle-5.29303557-5.2511675D
880handle-5.85205047-5.1202114D
881handle-5.88254839-5.0070473D
882handle-5.98024507-5.2511675D
883handle-6.06716185-4.9884421D
884handle-6.01373034-4.8688970D
885handle-5.70197749-5.0432504D
886handle-5.51815555-4.7570487D
887handle-5.78259666-4.9304549D
888handle-5.63373396-4.7570487D
889handle-6.15858595-5.0162217D
890handle-5.68991098-4.8256470D
891handle-5.53246707-4.6833958D
892handle-5.61100571-4.4869624D
893handle-5.64331870-5.0695677D
894handle-5.66533216-4.8256470D
895handle-5.59444904-4.8366361D
896handle-5.55355748-4.6310963D
897handle-5.75738524-4.8145358D
898handle-5.63373396-4.6575879D
899handle-5.70795658-5.0695677D
900handle-5.59110455-4.8256470D
901handle-5.30174603-4.3720821D
902handle-5.58438184-4.7688135D
903handle-5.54306788-4.7804416D
904handle-5.53246707-4.8366361D
905handle-6.03114101-5.2725969D
906handle-6.31801369-5.5803933D
907handle-6.37620621-1.9350930D
908handle-5.67769708-0.6678258D
909handle-5.68687146-0.5988811D
910handle-6.18482926 0.5048040D
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Replication_DMS_wt_nonwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6070 -0.3981 -0.2273  0.4363  3.5226 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.84137    0.22923  -12.39 2.89e-16 ***
score_nonwt  0.52217    0.04937   10.58 6.63e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.294 on 46 degrees of freedom
Multiple R-squared:  0.7086,	Adjusted R-squared:  0.7023 
F-statistic: 111.9 on 1 and 46 DF,  p-value: 6.629e-14
No description has been provided for this image
In [131]:
#Dataframe for pooled capsid Deletion   
Competition_Pool_Enrich2_Scores_Capsid_Deletion <- filter(Competition_Pool_Enrich2_Scores,position<613)
Competition_Pool_Enrich2_Scores_Capsid_Deletion <- filter(Competition_Pool_Enrich2_Scores_Capsid_Deletion,Variant %in% c("_1AAdel","_2AAdel","_3AAdel"))
Competition_Pool_Enrich2_Scores_Capsid_Deletion <- Competition_Pool_Enrich2_Scores_Capsid_Deletion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Capsid_Deletion)[3])
Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant <- gsub("_1AAdel", "1AAdel", Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant <- gsub("_2AAdel", "2AAdel", Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant <- gsub("_3AAdel", "3AAdel", Competition_Pool_Enrich2_Scores_Capsid_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Capsid_Deletion

#Dataframe for capsid Deletion non-normalized experiment  
Capsid_deletion_N_terminus_2A <- filter(merged_df_indel_DMS,position>576 & position<613)
Capsid_deletion_N_terminus_2A <- filter(Capsid_deletion_N_terminus_2A,AminoAcid %in% c("1AAdel","2AAdel","3AAdel"))
Capsid_deletion_N_terminus_2A <- Capsid_deletion_N_terminus_2A %>% rename(Variant = colnames(Capsid_deletion_N_terminus_2A)[2])
Capsid_deletion_N_terminus_2A <- Capsid_deletion_N_terminus_2A %>% rename(score_nonwt = colnames(Capsid_deletion_N_terminus_2A)[3])
Capsid_deletion_N_terminus_2A

Merged_Capsid_deletion_wt_nonwt  <- merge(Competition_Pool_Enrich2_Scores_Capsid_Deletion,Capsid_deletion_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Capsid_deletion_wt_nonwt

#Scatter plot capsid deletion wildtype vs non wildtype normalized
Scatter_plot_CapsidDeletion_wt_nonwt <- ggplot(Merged_Capsid_deletion_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Enrich2 Score", y = "Enrich2 Scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_CapsidDeletion_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_CapsidDeletion_wt_nonwt,filename = "Scatter_plot_CapsidDeletion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Capsid_deletion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Capsid_deletion_wt_nonwt)
summary_model_Capsid_deletion <- summary(model_Capsid_deletion)
summary_model_Capsid_deletion
A tibble: 108 × 3
positionVariantscore_wt
<dbl><chr><dbl>
5771AAdel-5.0212407
5772AAdel-6.7742591
5773AAdel-5.0938696
5781AAdel-4.2644383
5782AAdel-2.1168360
5783AAdel-4.2882355
5791AAdel 3.5386093
5792AAdel-0.5381575
5793AAdel-4.6268937
5801AAdel 1.2185610
5802AAdel-2.5369032
5803AAdel-6.3762062
5811AAdel-1.4574454
5812AAdel-2.5426362
5813AAdel-6.3608094
5821AAdel-2.5510748
5822AAdel-5.0276277
5823AAdel-6.4667129
5831AAdel-4.8748918
5832AAdel-6.3356707
5833AAdel-3.5367626
5841AAdel-2.1131554
5842AAdel 0.6757330
5843AAdel-1.3671723
5851AAdel 0.5374409
5852AAdel-0.7120180
5853AAdel-4.9806632
5861AAdel-1.2041708
5862AAdel-2.7770018
5863AAdel-3.7595006
⋮⋮⋮
6031AAdel-6.342015
6032AAdel-5.213287
6033AAdel-6.517064
6041AAdel-4.929478
6042AAdel-6.337260
6043AAdel-6.354584
6051AAdel-5.704178
6052AAdel-6.596136
6053AAdel-4.753775
6061AAdel-5.898499
6062AAdel-5.961616
6063AAdel-6.340433
6071AAdel-6.374677
6072AAdel-6.271609
6073AAdel-6.521037
6081AAdel-6.511741
6082AAdel-6.303332
6083AAdel-4.875257
6091AAdel-6.554829
6092AAdel-5.213596
6093AAdel-6.383817
6101AAdel-6.515736
6102AAdel-5.999529
6103AAdel-6.251043
6111AAdel-5.204323
6112AAdel-6.510406
6113AAdel-6.536775
6121AAdel-6.608324
6122AAdel-6.626332
6123AAdel-6.455485
A data.frame: 108 × 4
positionVariantscore_nonwtsub
<dbl><fct><dbl><chr>
5771AAdel-2.3578354C
5772AAdel-2.7752988C
5773AAdel-1.2981934C
5781AAdel 2.6041417C
5782AAdel-0.4533854C
5783AAdel-2.4517096C
5791AAdel 6.7995007C
5792AAdel 3.6447267C
5793AAdel-2.3689158C
5801AAdel 4.1870394C
5802AAdel 0.4254279C
5803AAdel-2.1659979C
5811AAdel 1.2360616C
5812AAdel 1.1252104C
5813AAdel-2.3853096C
5821AAdel-0.9166535C
5822AAdel-2.7379113C
5823AAdel-2.3503796C
5831AAdel-2.4120492C
5832AAdel-2.4618022C
5833AAdel-2.4173124C
5841AAdel-0.7457405C
5842AAdel 5.1846288C
5843AAdel 3.0234802C
5851AAdel 3.9092838C
5852AAdel 3.9308834C
5853AAdel-2.4517096C
5861AAdel 1.4321011C
5862AAdel-2.6951032C
5863AAdel-2.7467609C
⋮⋮⋮⋮
6031AAdel-2.415561C
6032AAdel-2.518728C
6033AAdel-2.619414C
6041AAdel-2.314185C
6042AAdel-2.296572C
6043AAdel-2.488228C
6051AAdel-2.507599C
6052AAdel-2.635017C
6053AAdel-2.401439C
6061AAdel-2.489857C
6062AAdel-2.520308C
6063AAdel-2.415561C
6071AAdel-2.406758C
6072AAdel-2.390715C
6073AAdel-2.625116C
6081AAdel-2.569587C
6082AAdel-2.426023C
6083AAdel-2.488228C
6091AAdel-2.539073C
6092AAdel-2.424287C
6093AAdel-2.266510C
6101AAdel-2.512384C
6102AAdel-2.476753C
6103AAdel-2.250099C
6111AAdel-2.321915C
6112AAdel-2.486597C
6113AAdel-2.453399C
6121AAdel-2.534415C
6122AAdel-2.692449C
6123AAdel-2.607911C
A data.frame: 108 × 5
positionVariantscore_wtscore_nonwtsub
<dbl><chr><dbl><dbl><chr>
5771AAdel-5.0212407-2.3578354C
5772AAdel-6.7742591-2.7752988C
5773AAdel-5.0938696-1.2981934C
5781AAdel-4.2644383 2.6041417C
5782AAdel-2.1168360-0.4533854C
5783AAdel-4.2882355-2.4517096C
5791AAdel 3.5386093 6.7995007C
5792AAdel-0.5381575 3.6447267C
5793AAdel-4.6268937-2.3689158C
5801AAdel 1.2185610 4.1870394C
5802AAdel-2.5369032 0.4254279C
5803AAdel-6.3762062-2.1659979C
5811AAdel-1.4574454 1.2360616C
5812AAdel-2.5426362 1.1252104C
5813AAdel-6.3608094-2.3853096C
5821AAdel-2.5510748-0.9166535C
5822AAdel-5.0276277-2.7379113C
5823AAdel-6.4667129-2.3503796C
5831AAdel-4.8748918-2.4120492C
5832AAdel-6.3356707-2.4618022C
5833AAdel-3.5367626-2.4173124C
5841AAdel-2.1131554-0.7457405C
5842AAdel 0.6757330 5.1846288C
5843AAdel-1.3671723 3.0234802C
5851AAdel 0.5374409 3.9092838C
5852AAdel-0.7120180 3.9308834C
5853AAdel-4.9806632-2.4517096C
5861AAdel-1.2041708 1.4321011C
5862AAdel-2.7770018-2.6951032C
5863AAdel-3.7595006-2.7467609C
⋮⋮⋮⋮⋮
6031AAdel-6.342015-2.415561C
6032AAdel-5.213287-2.518728C
6033AAdel-6.517064-2.619414C
6041AAdel-4.929478-2.314185C
6042AAdel-6.337260-2.296572C
6043AAdel-6.354584-2.488228C
6051AAdel-5.704178-2.507599C
6052AAdel-6.596136-2.635017C
6053AAdel-4.753775-2.401439C
6061AAdel-5.898499-2.489857C
6062AAdel-5.961616-2.520308C
6063AAdel-6.340433-2.415561C
6071AAdel-6.374677-2.406758C
6072AAdel-6.271609-2.390715C
6073AAdel-6.521037-2.625116C
6081AAdel-6.511741-2.569587C
6082AAdel-6.303332-2.426023C
6083AAdel-4.875257-2.488228C
6091AAdel-6.554829-2.539073C
6092AAdel-5.213596-2.424287C
6093AAdel-6.383817-2.266510C
6101AAdel-6.515736-2.512384C
6102AAdel-5.999529-2.476753C
6103AAdel-6.251043-2.250099C
6111AAdel-5.204323-2.321915C
6112AAdel-6.510406-2.486597C
6113AAdel-6.536775-2.453399C
6121AAdel-6.608324-2.534415C
6122AAdel-6.626332-2.692449C
6123AAdel-6.455485-2.607911C
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Capsid_deletion_wt_nonwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3484 -0.8641 -0.3445  0.6851  3.3740 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.29606    0.14895  -22.13   <2e-16 ***
score_nonwt  0.91394    0.05857   15.61   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.141 on 106 degrees of freedom
Multiple R-squared:  0.6967,	Adjusted R-squared:  0.6939 
F-statistic: 243.5 on 1 and 106 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [132]:
#Dataframe for pooled replication Deletion   
Competition_Pool_Enrich2_Scores_Replication_Deletion <- filter(Competition_Pool_Enrich2_Scores,position>862)
Competition_Pool_Enrich2_Scores_Replication_Deletion <- filter(Competition_Pool_Enrich2_Scores_Replication_Deletion,Variant %in% c("_1AAdel","_2AAdel","_3AAdel"))
Competition_Pool_Enrich2_Scores_Replication_Deletion <- Competition_Pool_Enrich2_Scores_Replication_Deletion %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Replication_Deletion)[3])
Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant <- gsub("_1AAdel", "1AAdel", Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant <- gsub("_2AAdel", "2AAdel", Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant <- gsub("_3AAdel", "3AAdel", Competition_Pool_Enrich2_Scores_Replication_Deletion$Variant)
Competition_Pool_Enrich2_Scores_Replication_Deletion

#Dataframe for replication Deletion non-normalized experiment  
Replication_deletion_N_terminus_2A <- filter(merged_df_indel_DMS,position>862 & position<911)
Replication_deletion_N_terminus_2A <- filter(Replication_deletion_N_terminus_2A,AminoAcid %in% c("1AAdel","2AAdel","3AAdel"))
Replication_deletion_N_terminus_2A <- Replication_deletion_N_terminus_2A %>% rename(Variant = colnames(Replication_deletion_N_terminus_2A)[2])
Replication_deletion_N_terminus_2A <- Replication_deletion_N_terminus_2A %>% rename(score_nonwt = colnames(Replication_deletion_N_terminus_2A)[3])
Replication_deletion_N_terminus_2A

Merged_Replication_deletion_wt_nonwt  <- merge(Competition_Pool_Enrich2_Scores_Replication_Deletion,Replication_deletion_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Replication_deletion_wt_nonwt

#Scatter plot replication deletion wildtype vs non wildtype normalized
Scatter_plot_ReplicationDeletion_wt_nonwt <- ggplot(Merged_Replication_deletion_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#a91515")+labs(x = "Enrich2 Score", y = "Enrich2 Scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_ReplicationDeletion_wt_nonwt
#Save figute
ggsave(plot = Scatter_plot_ReplicationDeletion_wt_nonwt,filename = "Scatter_plot_ReplicationDeletion_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Replication_deletion <- lm(`score_wt`~ `score_nonwt`, data = Merged_Replication_deletion_wt_nonwt)
summary_model_Replication_deletion <- summary(model_Replication_deletion)
summary_model_Replication_deletion
A tibble: 144 × 3
positionVariantscore_wt
<dbl><chr><dbl>
8631AAdel-6.37467716
8632AAdel-4.75496860
8633AAdel-2.68158048
8641AAdel 1.10097240
8642AAdel-3.36379488
8643AAdel-2.64838944
8651AAdel 1.83509394
8652AAdel-1.72911126
8653AAdel-4.07427109
8661AAdel 0.43105445
8662AAdel-3.53712924
8663AAdel-3.88133406
8671AAdel 0.67720653
8672AAdel-4.84949044
8673AAdel-4.47663626
8681AAdel 0.47879095
8682AAdel-1.73860582
8683AAdel-6.50638976
8691AAdel 0.04180394
8692AAdel-3.90495543
8693AAdel-6.56373446
8701AAdel-0.87970812
8702AAdel-6.21758924
8703AAdel-6.23533594
8711AAdel-0.35578323
8712AAdel-5.83118922
8713AAdel-6.44698002
8721AAdel-4.74296224
8722AAdel-6.26821087
8723AAdel-6.60953454
⋮⋮⋮
9011AAdel-6.429751
9012AAdel-6.256224
9013AAdel-6.527625
9021AAdel-6.581312
9022AAdel-6.071316
9023AAdel-6.544552
9031AAdel-6.278372
9032AAdel-6.274996
9033AAdel-5.928528
9041AAdel-5.305298
9042AAdel-5.966531
9043AAdel-6.422483
9051AAdel-5.090539
9052AAdel-5.655986
9053AAdel-5.259759
9061AAdel-3.084304
9062AAdel-5.869954
9063AAdel-6.426850
9071AAdel-4.695391
9072AAdel-5.740217
9073AAdel-6.194037
9081AAdel-5.877530
9082AAdel-5.420188
9083AAdel-6.397371
9091AAdel-4.174291
9092AAdel-5.809883
9093AAdel-5.002473
9101AAdel-6.298389
9102AAdel-5.038871
9103AAdel-5.904827
A data.frame: 144 × 4
positionVariantscore_nonwtsub
<dbl><fct><dbl><chr>
8631AAdel-2.66641372D
8632AAdel-4.88781015D
8633AAdel 1.18925684D
8641AAdel 7.19446630D
8642AAdel 0.09504572D
8643AAdel 1.18637549D
8651AAdel 7.25664801D
8652AAdel 2.59107135D
8653AAdel 1.75828100D
8661AAdel 6.18943321D
8662AAdel 1.71522700D
8663AAdel 0.19412594D
8671AAdel 6.18370269D
8672AAdel 2.31402344D
8673AAdel 0.07227784D
8681AAdel 6.40225455D
8682AAdel 2.44891091D
8683AAdel-4.65007706D
8691AAdel 5.95561786D
8692AAdel-0.34340309D
8693AAdel-5.01976600D
8701AAdel 5.21598662D
8702AAdel-4.71201636D
8703AAdel-4.69383404D
8711AAdel 4.95562685D
8712AAdel-4.13498249D
8713AAdel-4.85711721D
8721AAdel 2.35439207D
8722AAdel-4.71201636D
8723AAdel-4.91758903D
⋮⋮⋮⋮
9011AAdel-4.753199D
9012AAdel-4.371934D
9013AAdel-4.650077D
9021AAdel-4.610984D
9022AAdel-4.363459D
9023AAdel-4.955963D
9031AAdel-4.405132D
9032AAdel-4.236593D
9033AAdel-4.922467D
9041AAdel-4.113360D
9042AAdel-4.166564D
9043AAdel-4.363459D
9051AAdel-4.549321D
9052AAdel-4.437263D
9053AAdel-4.236593D
9061AAdel-4.468394D
9062AAdel-3.779071D
9063AAdel-4.687699D
9071AAdel-3.934312D
9072AAdel-4.302037D
9073AAdel-4.371934D
9081AAdel-3.907820D
9082AAdel-4.363459D
9083AAdel-4.556363D
9091AAdel 1.005060D
9092AAdel-4.009819D
9093AAdel-4.535086D
9101AAdel-4.197179D
9102AAdel-4.009819D
9103AAdel-4.460702D
A data.frame: 144 × 5
positionVariantscore_wtscore_nonwtsub
<dbl><chr><dbl><dbl><chr>
8631AAdel-6.37467716-2.66641372D
8632AAdel-4.75496860-4.88781015D
8633AAdel-2.68158048 1.18925684D
8641AAdel 1.10097240 7.19446630D
8642AAdel-3.36379488 0.09504572D
8643AAdel-2.64838944 1.18637549D
8651AAdel 1.83509394 7.25664801D
8652AAdel-1.72911126 2.59107135D
8653AAdel-4.07427109 1.75828100D
8661AAdel 0.43105445 6.18943321D
8662AAdel-3.53712924 1.71522700D
8663AAdel-3.88133406 0.19412594D
8671AAdel 0.67720653 6.18370269D
8672AAdel-4.84949044 2.31402344D
8673AAdel-4.47663626 0.07227784D
8681AAdel 0.47879095 6.40225455D
8682AAdel-1.73860582 2.44891091D
8683AAdel-6.50638976-4.65007706D
8691AAdel 0.04180394 5.95561786D
8692AAdel-3.90495543-0.34340309D
8693AAdel-6.56373446-5.01976600D
8701AAdel-0.87970812 5.21598662D
8702AAdel-6.21758924-4.71201636D
8703AAdel-6.23533594-4.69383404D
8711AAdel-0.35578323 4.95562685D
8712AAdel-5.83118922-4.13498249D
8713AAdel-6.44698002-4.85711721D
8721AAdel-4.74296224 2.35439207D
8722AAdel-6.26821087-4.71201636D
8723AAdel-6.60953454-4.91758903D
⋮⋮⋮⋮⋮
9011AAdel-6.429751-4.753199D
9012AAdel-6.256224-4.371934D
9013AAdel-6.527625-4.650077D
9021AAdel-6.581312-4.610984D
9022AAdel-6.071316-4.363459D
9023AAdel-6.544552-4.955963D
9031AAdel-6.278372-4.405132D
9032AAdel-6.274996-4.236593D
9033AAdel-5.928528-4.922467D
9041AAdel-5.305298-4.113360D
9042AAdel-5.966531-4.166564D
9043AAdel-6.422483-4.363459D
9051AAdel-5.090539-4.549321D
9052AAdel-5.655986-4.437263D
9053AAdel-5.259759-4.236593D
9061AAdel-3.084304-4.468394D
9062AAdel-5.869954-3.779071D
9063AAdel-6.426850-4.687699D
9071AAdel-4.695391-3.934312D
9072AAdel-5.740217-4.302037D
9073AAdel-6.194037-4.371934D
9081AAdel-5.877530-3.907820D
9082AAdel-5.420188-4.363459D
9083AAdel-6.397371-4.556363D
9091AAdel-4.174291 1.005060D
9092AAdel-5.809883-4.009819D
9093AAdel-5.002473-4.535086D
9101AAdel-6.298389-4.197179D
9102AAdel-5.038871-4.009819D
9103AAdel-5.904827-4.460702D
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Replication_deletion_wt_nonwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6654 -0.5129 -0.2778  0.3367  3.6205 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.42809    0.12554  -27.31   <2e-16 ***
score_nonwt  0.53759    0.02835   18.97   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.998 on 142 degrees of freedom
Multiple R-squared:  0.717,	Adjusted R-squared:  0.715 
F-statistic: 359.7 on 1 and 142 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [133]:
#Dataframe for pooled capsid DMS   
Competition_Pool_Enrich2_Scores_Capsid_DMS <- filter(Competition_Pool_Enrich2_Scores,position<613)
Competition_Pool_Enrich2_Scores_Capsid_DMS <- filter(Competition_Pool_Enrich2_Scores_Capsid_DMS,Variant %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Competition_Pool_Enrich2_Scores_Capsid_DMS <- Competition_Pool_Enrich2_Scores_Capsid_DMS %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Capsid_DMS)[3])
Competition_Pool_Enrich2_Scores_Capsid_DMS

#Dataframe for capsid DMS non-normalized experiment  
Capsid_DMS_N_terminus_VP1 <- filter(merged_df_indel_DMS,position>576 & position<613)
Capsid_DMS_N_terminus_VP1 <- filter(Capsid_DMS_N_terminus_VP1,AminoAcid %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Capsid_DMS_N_terminus_VP1 <- Capsid_DMS_N_terminus_VP1 %>% rename(Variant = colnames(Capsid_DMS_N_terminus_VP1)[2])
Capsid_DMS_N_terminus_VP1 <- Capsid_DMS_N_terminus_VP1 %>% rename(score_nonwt = colnames(Capsid_DMS_N_terminus_VP1)[3])
Capsid_DMS_N_terminus_VP1
Merged_Capsid_DMS_wt_nonwt  <- merge(Competition_Pool_Enrich2_Scores_Capsid_DMS,Capsid_DMS_N_terminus_VP1,all.x=TRUE,all.y=TRUE)
Merged_Capsid_DMS_wt_nonwt

#Scatter plot capsid DMS wildtype vs non wildtype normalized
Scatter_plot_CapsidDMS_wt_nonwt <- ggplot(Merged_Capsid_DMS_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Enrich2 score", y = "Enrich2 score WT normalized") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_CapsidDMS_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_CapsidDMS_wt_nonwt,filename = "Scatter_plot_CapsidDMS_wt_nonwt.pdf",device = pdf,width = 2,height=2)
#Linear model for wildtype vs nonwildtype experiment
model_Capsid_DMS <- lm(`score_wt`~ `score_nonwt`, data = Merged_Capsid_DMS_wt_nonwt)
summary_model_Capsid_DMS <- summary(model_Capsid_DMS)
summary_model_Capsid_DMS
A tibble: 684 × 3
positionVariantscore_wt
<dbl><fct><dbl>
577A-4.0681920
577C-5.4932380
577D-5.6746217
577E-5.3712305
577F-5.1615022
577G-5.0592367
577H-5.2259058
577K-1.8548324
577L-2.1140470
577M-0.7525873
577N-5.2285782
577P-4.3889911
577Q-4.4021126
577R-2.3668099
577S-4.7022723
577T-1.7286223
577V 1.0628603
577W-3.3041341
577Y-3.2116524
578A-3.4267051
578C-6.2138290
578D-4.6105516
578E-1.2980551
578F-5.8688034
578H-5.2044387
578I-5.1851741
578K-5.5388190
578L-4.2710684
578M-5.9569094
578N-5.0356990
⋮⋮⋮
611L-4.620711
611M-5.519603
611N-5.876337
611P-3.313134
611Q-4.451732
611R-3.899632
611S-2.817052
611T-1.878903
611V-1.079373
611W-5.388449
611Y-5.721858
612A-3.521517
612C-5.183790
612D-5.909934
612E-5.746796
612F-1.967298
612G-3.988101
612H-2.179719
612I-3.771855
612K-5.571930
612M-5.527930
612N-4.336194
612P-1.406304
612Q-5.722649
612R-3.061978
612S-2.072730
612T-3.236892
612V-4.351154
612W-5.780988
612Y-5.449361
A data.frame: 720 × 4
positionVariantscore_nonwtsub
<dbl><fct><dbl><chr>
577A-5.1246825C
577C-5.2262348C
577D-4.7012331C
577E-5.2849820C
577F-5.6736885C
577G-6.4046472C
577H-5.4793909C
577I NAC
577K-4.1152428C
577L-4.7887246C
577M-0.7580631C
577N-4.7874430C
577P-5.0204239C
577Q-5.2350027C
577R-5.5286590C
577S-6.5757933C
577T-4.7898422C
577V-0.5469281C
577W-5.0126960C
577Y-4.8839610C
578A-4.6214137C
578C-5.4494128C
578D-5.2709751C
578E-5.1585080C
578F-5.6481755C
578G NAC
578H-5.3404873C
578I-5.2851494C
578K-4.4227698C
578L-4.4423021C
⋮⋮⋮⋮
611M-5.0338953C
611N-5.6659750C
611P-4.8742732C
611Q-3.8101310C
611R-4.6766472C
611S-4.8079484C
611T-0.9246352C
611V-4.5783403C
611W-4.6243165C
611Y-5.3610731C
612A-5.7091347C
612C-5.0978707C
612D-5.4483826C
612E-5.6151519C
612F-5.4403390C
612G-4.9803114C
612H-5.0385042C
612I-3.1776197C
612K-5.4654803C
612L NAC
612M-5.0200747C
612N-4.9585691C
612P-5.5663309C
612Q-5.5409772C
612R-4.7718689C
612S-1.8884444C
612T-5.4643457C
612V-2.1406578C
612W-5.1927246C
612Y-4.9886971C
A data.frame: 720 × 5
positionVariantscore_wtscore_nonwtsub
<dbl><fct><dbl><dbl><chr>
577P-4.3889911-5.0204239C
577G-5.0592367-6.4046472C
577Y-3.2116524-4.8839610C
577W-3.3041341-5.0126960C
577F-5.1615022-5.6736885C
577V 1.0628603-0.5469281C
577L-2.1140470-4.7887246C
577I NA NAC
577A-4.0681920-5.1246825C
577T-1.7286223-4.7898422C
577S-4.7022723-6.5757933C
577Q-4.4021126-5.2350027C
577N-5.2285782-4.7874430C
577M-0.7525873-0.7580631C
577C-5.4932380-5.2262348C
577E-5.3712305-5.2849820C
577D-5.6746217-4.7012331C
577R-2.3668099-5.5286590C
577K-1.8548324-4.1152428C
577H-5.2259058-5.4793909C
578P-4.6974161-5.0899979C
578G NA NAC
578Y-4.7477759-5.6949842C
578W-2.2799074-1.9766006C
578F-5.8688034-5.6481755C
578V-2.0448772-4.9806819C
578L-4.2710684-4.4423021C
578I-5.1851741-5.2851494C
578A-3.4267051-4.6214137C
578T-5.0370794-6.1043482C
⋮⋮⋮⋮⋮
611S-2.817052-4.807948C
611Q-4.451732-3.810131C
611N-5.876337-5.665975C
611M-5.519603-5.033895C
611C-4.839482-5.190619C
611E-2.839048-2.829627C
611D-4.436985-4.531521C
611R-3.899632-4.676647C
611K-6.488425-4.928408C
611H-1.862492-1.780014C
612P-1.406304-5.566331C
612G-3.988101-4.980311C
612Y-5.449361-4.988697C
612W-5.780988-5.192725C
612F-1.967298-5.440339C
612V-4.351154-2.140658C
612L NA NAC
612I-3.771855-3.177620C
612A-3.521517-5.709135C
612T-3.236892-5.464346C
612S-2.072730-1.888444C
612Q-5.722649-5.540977C
612N-4.336194-4.958569C
612M-5.527930-5.020075C
612C-5.183790-5.097871C
612E-5.746796-5.615152C
612D-5.909934-5.448383C
612R-3.061978-4.771869C
612K-5.571930-5.465480C
612H-2.179719-5.038504C
`geom_smooth()` using formula = 'y ~ x'
Warning message:
“Removed 36 rows containing non-finite values (`stat_smooth()`).”
Warning message:
“Removed 36 rows containing missing values (`geom_point()`).”
`geom_smooth()` using formula = 'y ~ x'
Warning message:
“Removed 36 rows containing non-finite values (`stat_smooth()`).”
Warning message:
“Removed 36 rows containing missing values (`geom_point()`).”
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Capsid_DMS_wt_nonwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2593 -0.8063  0.0975  0.6572  4.3371 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.96786    0.06285   15.40   <2e-16 ***
score_nonwt  0.96218    0.01809   53.18   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.136 on 682 degrees of freedom
  (36 observations deleted due to missingness)
Multiple R-squared:  0.8057,	Adjusted R-squared:  0.8054 
F-statistic:  2828 on 1 and 682 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [134]:
#Dataframe for pooled replication DMS   
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS <- filter(Competition_Pool_Enrich2_Scores,position>862)
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS <- filter(Competition_Pool_Enrich2_Scores_Replicationproteins_DMS,Variant %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS <- Competition_Pool_Enrich2_Scores_Replicationproteins_DMS %>% rename(score_wt = colnames(Competition_Pool_Enrich2_Scores_Replicationproteins_DMS)[3])
Competition_Pool_Enrich2_Scores_Replicationproteins_DMS

#Dataframe for replication DMS non-normalized experiment  
Replication_DMS_N_terminus_2A <- filter(merged_df_indel_DMS,position>862 & position<911)
Replication_DMS_N_terminus_2A <- filter(Replication_DMS_N_terminus_2A,AminoAcid %in% c("Y","W","V","T","S","R","Q","P","N","M","L","K","I","H","G","F","E","D","C","A"))
Replication_DMS_N_terminus_2A <- Replication_DMS_N_terminus_2A %>% rename(Variant = colnames(Replication_DMS_N_terminus_2A)[2])
Replication_DMS_N_terminus_2A <- Replication_DMS_N_terminus_2A %>% rename(score_nonwt = colnames(Replication_DMS_N_terminus_2A)[3])
Replication_DMS_N_terminus_2A
Merged_Replication_DMS_wt_nonwt  <- merge(Competition_Pool_Enrich2_Scores_Replicationproteins_DMS,Replication_DMS_N_terminus_2A,all.x=TRUE,all.y=TRUE)
Merged_Replication_DMS_wt_nonwt

#Scatter plot replication DMS wildtype vs non wildtype normalized
Scatter_plot_ReplicationDMS_wt_nonwt <- ggplot(Merged_Replication_DMS_wt_nonwt, aes(x = `score_nonwt`, y = `score_wt`)) + geom_point(size = 0.05,color="#3c7430")+labs(x = "Enrich2 Score", y = "Enrich2 Scorewt") +theme_classic()+
coord_fixed(xlim=c(-8,8),ylim=c(-8,8))+ geom_smooth(method="lm",color="black",lwd=0.3)
Scatter_plot_ReplicationDMS_wt_nonwt
#Save figure
ggsave(plot = Scatter_plot_ReplicationDMS_wt_nonwt,filename = "Scatter_plot_ReplicationDMS_wt_nonwt.pdf",device = pdf,width = 2,height=2)
model_Replication_DMS <- lm(`score_wt`~ `score_nonwt`, data = Merged_Replication_DMS_wt_nonwt)
#Linear model for wildtype vs nonwildtype experiment
summary_model_Replication_DMS <- summary(model_Replication_DMS)
summary_model_Replication_DMS
A tibble: 912 × 3
positionVariantscore_wt
<dbl><fct><dbl>
863A-2.63880062
863C-6.55351361
863D-4.83644883
863E-1.12910875
863F-6.00888197
863H-5.41506872
863I-5.19235341
863K-4.49803959
863L-4.31494999
863M-6.12968562
863N-3.83184087
863P-4.45111377
863Q-4.27688280
863R-1.01681740
863S-2.77112089
863T-5.63208995
863V-3.21275793
863W-3.53544384
863Y-5.54266438
864A 1.13692234
864C 1.16364994
864D 0.88141774
864E 0.81518223
864F-0.40552718
864G 0.67202014
864H 1.10202831
864I 0.82939297
864L 1.18361369
864M 1.51883201
864N-0.07915795
⋮⋮⋮
909L-1.0403038
909M 1.1010109
909N 0.9882308
909P-2.4580308
909Q 1.2878565
909R 1.1091144
909S 0.2636672
909T 0.8568695
909V 0.2414721
909W 1.0353956
909Y 1.1249580
910A 0.9576387
910C 0.3175552
910D 1.1818012
910E 1.4313338
910F 1.3379674
910G 0.5113716
910H 0.9969034
910I-4.6552593
910K-1.4978718
910L-0.6404986
910M 0.9430934
910N 0.6567146
910P-3.4641825
910R-1.2790269
910S-0.4744694
910T-3.2459876
910V-4.7139707
910W 1.4848147
910Y 1.5962951
A data.frame: 960 × 4
positionVariantscore_nonwtsub
<dbl><fct><dbl><chr>
863A-0.6993640D
863C-4.6222291D
863D-2.7010239D
863E-0.3387277D
863F-2.7118695D
863G NAD
863H-4.0723297D
863I-4.3739400D
863K-2.7132862D
863L-4.9859243D
863M-3.3827474D
863N-2.3354482D
863P-4.0953320D
863Q-3.9074147D
863R-4.1777451D
863S-3.6059825D
863T-3.3654838D
863V-3.7654225D
863W-1.2054866D
863Y-4.8484681D
864A 2.0226372D
864C 1.7417477D
864D 1.5003729D
864E 1.6555950D
864F-0.3130177D
864G 1.3599511D
864H 1.9457185D
864I 1.7591695D
864K NAD
864L 1.9208852D
⋮⋮⋮⋮
909M 2.1335153D
909N 1.9037633D
909P-4.2713596D
909Q 2.1386790D
909R 2.0586316D
909S 2.5828672D
909T 1.9927668D
909V 1.4718492D
909W 1.6052248D
909Y 1.6508510D
910A 1.8229694D
910C 1.3766078D
910D 2.0644396D
910E 1.6048646D
910F 1.9512124D
910G 1.2942231D
910H 1.6815032D
910I-3.8056666D
910K-0.2243074D
910L-0.2942483D
910M 1.5960049D
910N 1.4869589D
910P-4.0244476D
910Q NAD
910R-3.6907624D
910S 1.1593934D
910T-2.5391106D
910V-4.1553262D
910W 2.1430312D
910Y 2.6162566D
A data.frame: 960 × 5
positionVariantscore_wtscore_nonwtsub
<dbl><fct><dbl><dbl><chr>
863P-4.4511138-4.0953320D
863G NA NAD
863Y-5.5426644-4.8484681D
863W-3.5354438-1.2054866D
863F-6.0088820-2.7118695D
863V-3.2127579-3.7654225D
863L-4.3149500-4.9859243D
863I-5.1923534-4.3739400D
863A-2.6388006-0.6993640D
863T-5.6320899-3.3654838D
863S-2.7711209-3.6059825D
863Q-4.2768828-3.9074147D
863N-3.8318409-2.3354482D
863M-6.1296856-3.3827474D
863C-6.5535136-4.6222291D
863E-1.1291087-0.3387277D
863D-4.8364488-2.7010239D
863R-1.0168174-4.1777451D
863K-4.4980396-2.7132862D
863H-5.4150687-4.0723297D
864P 0.8823261 1.8559857D
864G 0.6720201 1.3599511D
864Y-0.1301695-0.5660657D
864W 0.5788173 0.7805062D
864F-0.4055272-0.3130177D
864V 1.4461590 2.2089784D
864L 1.1836137 1.9208852D
864I 0.8293930 1.7591695D
864A 1.1369223 2.0226372D
864T 1.4475619 2.3093373D
⋮⋮⋮⋮⋮
909S 0.2636672 2.5828672D
909Q 1.2878565 2.1386790D
909N 0.9882308 1.9037633D
909M 1.1010109 2.1335153D
909C 0.2593156 2.1600574D
909E 1.3383096 1.5110635D
909D-1.5720789-1.3073061D
909R 1.1091144 2.0586316D
909K 1.5141629 2.2983104D
909H 1.2846214 2.1933719D
910P-3.4641825-4.0244476D
910G 0.5113716 1.2942231D
910Y 1.5962951 2.6162566D
910W 1.4848147 2.1430312D
910F 1.3379674 1.9512124D
910V-4.7139707-4.1553262D
910L-0.6404986-0.2942483D
910I-4.6552593-3.8056666D
910A 0.9576387 1.8229694D
910T-3.2459876-2.5391106D
910S-0.4744694 1.1593934D
910Q NA NAD
910N 0.6567146 1.4869589D
910M 0.9430934 1.5960049D
910C 0.3175552 1.3766078D
910E 1.4313338 1.6048646D
910D 1.1818012 2.0644396D
910R-1.2790269-3.6907624D
910K-1.4978718-0.2243074D
910H 0.9969034 1.6815032D
`geom_smooth()` using formula = 'y ~ x'
Warning message:
“Removed 48 rows containing non-finite values (`stat_smooth()`).”
Warning message:
“Removed 48 rows containing missing values (`geom_point()`).”
`geom_smooth()` using formula = 'y ~ x'
Warning message:
“Removed 48 rows containing non-finite values (`stat_smooth()`).”
Warning message:
“Removed 48 rows containing missing values (`geom_point()`).”
Call:
lm(formula = score_wt ~ score_nonwt, data = Merged_Replication_DMS_wt_nonwt)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3071 -0.5497  0.0199  0.4073  4.0668 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.00067    0.03475  -28.79   <2e-16 ***
score_nonwt  0.92497    0.01518   60.91   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.01 on 910 degrees of freedom
  (48 observations deleted due to missingness)
Multiple R-squared:  0.8031,	Adjusted R-squared:  0.8028 
F-statistic:  3711 on 1 and 910 DF,  p-value: < 2.2e-16
No description has been provided for this image
In [ ]: